Wednesday, January 19, 2005

Scott Hanselman just posted about a case-{in}sensitivity problem he just went through. That reminded me I wanted to talk to you about case-sensitivity in Xceed Zip for .NET and the FileSystem. I remember back in the design days, we debated long and hard on if the FileSystem should be case-sensitive or not. Once we decided to support both, the debate continued about what should be the default behavior.

The conclusions were simple:

  1. Immitate by default.
  2. Uniformity within single product.
  3. Know the differences.

Since System.IO was case-insensitive (and moreover the whole Windows operating system), we had to be case-insensitive by default. Thus, if you have files "first.txt" and "second.TXT" in a folder, the code below will return you two files:

DiskFolder disk = new DiskFolder( @"t:\" );
AbstractFile[] files = disk.GetFiles( false, "*.txt" );

The same way, if you have both files in a zip file, the following code will return both:

ZipArchive zip = new ZipArchive( new DiskFile( @"t:\" ) );
AbstractFile[] files = zip.GetFiles( false, "*.txt" );

Now, where it's getting tricky is that you will never have a folder on disk containing both "second.TXT" and "second.txt". The system won't let you create the second one. Thus the following code returns an existing file who's FullName is all lower case, even if the real file has an upper-case extension:

DiskFolder disk = new DiskFolder( @"t:\" );
AbstractFile file1 = disk.GetFile( "second.txt" );

You asked the "Disk" world for file "second.txt", and this world has recognized "second.TXT" as matching your request.

To obey to rule #2, the following code does exactly the same, even though the file stored in the zip file has its extension all upper case:

ZipArchive zip = new ZipArchive( new DiskFile( @"t:\" ) );
AbstractFile file2 = zip.GetFile( "second.txt" );

But in a zip file, which can come from a different operating system, you potentially could end up with a zip file containing both. What would happen? I've created such a zip file for our tests, by adding "second.TXT" and "foobar.txt" to a zip file, and hex-editing "foobar" to "second": (.23 KB)

When opening this file in WinZip, I can see both files. But when unzipping, it will unzip the first, then try to unzip the second over the first. You just can't unzip both in two separate files. Furthermore, trying to unzip any single one from within the classic view will always unzip both over the same file on disk.

How does Xceed Zip for .NET deal with such zip fles? Try the following code:

ZipArchive zip = new ZipArchive( new DiskFile( @"t:\" ) );
foreach( AbstractFile file in zip.GetFiles( false ) )
  Console.WriteLine( file.FullName );

The output is:


Any file that case-insensitively matches another file gets appended a number. This is not a perfect solution, as there is never a perfect solution. To support rule #2, DiskFolder and ZippedFolder instances had to behave the same. This post and the documentation tries to address rule #3 :-)

Now, some of you want to always look for exact matches. You simply need to prepend the string mask with a ">", as in "I want a more precise match" (1). The following code will match a single file:

DiskFolder disk = new DiskFolder( @"t:\" );
AbstractFile[] files = disk.GetFiles( false, ">*.txt" );

The idea with System.String filter parameters is that we replace them with a NameFilter, which is the one responsible for that ">" trick. It only works with methods accepting filters (GetFiles, GetFolders, CopyFilesTo, MoveFilesTo). Methods like GetFile can only return a single instance (actually always returns an instance which may exist or not). Those methods return the single and unique AbstractFile matching your string, based on the world this AbstractFolder belongs to.

(1): We actually debated between using "<" as in "match less items" or ">" as in "a more precise match". I think we ended up tossing a coin! :-)

1/19/2005 10:50:47 AM (Eastern Standard Time, UTC-05:00)  #   
Tracked by:
"backsoon" (backsoon) [Trackback]
"Untitled Document" (Untitled Document) [Trackback]
"Pic_6" (Pic_6) [Trackback]
"4839_architectonic -" (4839_architectonic - Departurebanish... [Trackback]
"Najdena jama" (Najdena jama) [Trackback]
"Galaxy Portfolio Services" (Galaxy Portfolio Services) [Trackback]
"New Page 9" (New Page 9) [Trackback]
"Caving map of Laze" (Caving map of Laze) [Trackback]
"halifest" (halifest) [Trackback]
"Untitled Document" (Untitled Document) [Trackback]
"fuck that, fuck them, fuck everyone else." (fuck that, fuck them, fuck everyone... [Trackback]
"Running The Guard... NYHC" (Running The Guard... NYHC) [Trackback]
"Untitled Document" (Untitled Document) [Trackback]
"Vranja jama" (Vranja jama) [Trackback] [Trackback] [Trackback]
"Planinska jama" (Planinska jama) [Trackback]
"Untitled Document" (Untitled Document) [Trackback]
"" ( [Trackback]
"Over N Out. :: Index" (Over N Out. :: Index) [Trackback]
"She Kills dot com bitches." (She Kills dot com bitches.) [Trackback]
"AlloyMail" (AlloyMail) [Trackback]
"Socialist Appeal" (Socialist Appeal) [Trackback]
"CHARA. reppin' pei" (CHARA. reppin' pei) [Trackback]
"8533 -" (8533 - [Trackback]
"Erica Funghi Snc" (Erica Funghi Snc) [Trackback]
"Baylor -" (Baylor - [Trackback]
"Untitled Document" (Untitled Document) [Trackback]
"Non-party -" (Non-party - [Trackback]
"Untitled Document" (Untitled Document) [Trackback]
"Untitled Document" (Untitled Document) [Trackback]
"Deutsche Lebens Rettungs Gesellschaft eV (DLRG)" (Deutsche Lebens Rettungs Gese... [Trackback]
"2029_guiftes -" (2029_guiftes - [Trackback]
"Camera Whores!!!" (Camera Whores!!!) [Trackback]
"Law-unto-itself -" (Law-unto-itself - [Trackback]
"Circumstances_8558 -" (Circumstances_8558 - Departurebanish... [Trackback]
"1471 -" (1471 - [Trackback] [Trackback]
"Suggestion_2469 -" (Suggestion_2469 - [Trackback]
"4 < 9" (4 < 9) [Trackback]
"Athenian_622 -" (Athenian_622 - [Trackback]
"Visual Elements: The Alkaline Earth Metals" (Visual Elements: The Alkaline Eart... [Trackback]
"Bus Pass" (Bus Pass) [Trackback]
"lined wicker laundry baskets" (lined wicker laundry baskets) [Trackback]
"2005 ford expedition" (2005 ford expedition) [Trackback]
"dawn of war mod" (dawn of war mod) [Trackback]
"Alberta Incorporation" (Alberta Incorporation) [Trackback]
"hookah diving with out tanks" (hookah diving with out tanks) [Trackback]
"mangosteen ORAC ounce" (mangosteen ORAC ounce) [Trackback]
"exotic canopy beds" (exotic canopy beds) [Trackback]
"open response questions" (open response questions) [Trackback]
"vulnerability scanner" (vulnerability scanner) [Trackback]
"baby photo contest" (baby photo contest) [Trackback]
"Flagyl and diverticulitis" (Flagyl and diverticulitis) [Trackback]
"What%27s a Jewish Matchmaker" (What%27s a Jewish Matchmaker) [Trackback]
"broker dealer information" (broker dealer information) [Trackback]
"compromises of the constitution" (compromises of the constitution) [Trackback]
"body aches and persistent fever" (body aches and persistent fever) [Trackback]
"dvd xcopy platinum" (dvd xcopy platinum) [Trackback]
"real audio player" (real audio player) [Trackback]
"mineral make-up" (mineral make-up) [Trackback]
"Camouflage Wetsuits" (Camouflage Wetsuits) [Trackback]
"moving companies quotes" (moving companies quotes) [Trackback]
"banner graphic, greencastle" (banner graphic, greencastle) [Trackback]
"protonix in surgery use" (protonix in surgery use) [Trackback]
"residence inn seaworld" (residence inn seaworld) [Trackback]
"vacation rentals massachusetts" (vacation rentals massachusetts) [Trackback]
"actonel d" (actonel d) [Trackback]
"kayak rod holder" (kayak rod holder) [Trackback]
"excel communications" (excel communications) [Trackback]
"personal injury lawyer virginia" (personal injury lawyer virginia) [Trackback]
"don gabriel cigar shop" (don gabriel cigar shop) [Trackback]
"first compound microscope" (first compound microscope) [Trackback]
"white pages perth" (white pages perth) [Trackback]
"jagg oil cooler" (jagg oil cooler) [Trackback]
"soft serve ice cream" (soft serve ice cream) [Trackback]
"hud property" (hud property) [Trackback]
"cervical disc problems" (cervical disc problems) [Trackback]
"No Deposit Bonus" (No Deposit Bonus) [Trackback]