Thursday, November 11, 2004

So the WinAmp adventure is over? Pfffe! That won't stop me from using WinAmp for listening to di.fm while working! Though I feel WMP10 and the WMA format does a better job at ripping my CDs (quality/size ratio), I've always had a better experience with WinAmp for listening to streamed audio.


Fun

11/11/2004 10:31:02 AM (Eastern Standard Time, UTC-05:00)  #   
 Wednesday, November 10, 2004

Lluis Sanchez Gual, of the Mono project, created a class library for generating IL in a higher level form than System.Reflexion.Emit. It's kinda CodeDom for IL!



11/10/2004 1:26:04 PM (Eastern Standard Time, UTC-05:00)  #   
 Tuesday, November 09, 2004

As you may expect it, I installed Konfabulator today. Though the installation process it smooth, the look is great, and the working experience is very interesting, I must admit seeing this in my Task Manager is somewhat of a turn down:

One process per widget, and an extra one for the tray icon and menu... is it worst it? Though CPU usage seems reasonable (compared with Desktop Sidebar for example), and the sum of memory usage averages what Outlook is using by itself, I'm left with the impression that this Windows version was made out of a hack, and no real "Windows integration" effort was made. On a developer's machine, I'm doomed to miss that memory one day.

Seeing that UnixUtils folder under Konfabulator also gives me similar impressions:

Oh well, I'll give it a real try... but I'm afraid it will end up just like Desktop Sidebar... uninstalled!



11/9/2004 3:34:11 PM (Eastern Standard Time, UTC-05:00)  #   
 Thursday, November 04, 2004

Pixoria is about to launch a Windows version of Konfabulator (desktop customization app). Their web site displays notes from a Paleontologist observing two species (Apple vs Windows computers) for ten days (today is day 7), with a probable outcome of seeing both species running Konfabulator.

I've always been very geek on desktop customization... Can't wait to see this Windows version! :-)


Fun

11/4/2004 5:12:42 PM (Eastern Daylight Time, UTC-04:00)  #   
 Friday, October 29, 2004

Nat Friedman writes a nice post about two development approaches: Getting nothing wrong versus getting it right. Though I know nothing about Muine, I can see the analogy with the software I developped (or at least what I try to achieve). While some competing products don't attack the ease of use and simply stick with a "most common features" list, I've always felt it was important to improve interface too. And that applies to class libraries as well. The Xceed Zip for .NET object-oriented design may require some getting used to, but you end up with obvious and short code. In fact, it's more "forgetting about the old interface" than "learning the new one".

Another good example is Money and Quicken: While they both fight to have all the features the other one has, the resulting applications are not addressing my needs. My wife and I split general expenses based on our salaries, and house expenses half and half. I'm stuck with Excel for managing all this. The personal finance software world needs a Muine of its own... I'm ready to live with its "wrongs"! Suggestions?



10/29/2004 1:43:06 PM (Eastern Daylight Time, UTC-04:00)  #   
 Tuesday, October 19, 2004

Robert Scoble wants all corporations to blog. He believes this is the best way to get information around, better than any other medium. He gives the Kryptonite example. Most of us heard about the flaw with some of their bicycle locks (you can unlock them with a Bic pen!). But nobody heard about the company's official response. According to Robert, the word of mouth power of blogs did spread the news much faster than anybody could hear from the company itself.

Well, that wasn't the case for me. I heard the news a few weeks ago on the TV news. And the topo gave me a clear view of the case, AND the corporate response, which was to replace all affected locks. I heard it on TV first, not in blogs, probably because I only read work-related blogs. I don't have time to read more. The general information, I leave this to the newspaper I read in the morning, and the TV news I watch in the evening, at home. I read blogs when I work. Robert's job is to blog, and read blogs (ok, ok, it's a simplified definition of his job). His company has the resources ($) to assign full-time people to evangelism (gee, I don't like that word). We don't. I'm a developer. I have stuff to analyse, design, implement, test, fix. I'm blogging for my own pleasure, and obviously with the impression it can benefit Xceed, but with a clear "parental guidance" not to spend too much time on it, and not to tell any secrets.

Which brings me to another subject: What can a corporation blog about? Where is the limit? How does a blogger who's no marketing genius knows he's about to say too much?

A colleague of mine wants to blog too. I already know he's the kind of blogger you won't want to miss. He masters technical details better than anybody I know. One could say I'm a generalist and he's a specialist. But his first post isn't online yet. Why? Because every subject he starts writing about, he ends up with the impression he's giving too much valuable information to our competitors. Nobody here at Xceed is filtering our blog posts. Our boss gave us the green light, with very few rules (if we can call them rules). It doesn't stop us from auto-censoring our posts. In my case, it's easy, since I talk more about the public interface than the inners of a product. But in his case, it has become a show stopper: he's convinced he can't blog without saying too much.

And that's a shame, because once he starts blogging, we'll all benefit from it... but as I write this, I realise that "all" means "our clients and our competitors". :-( Boy, I think Robert's position is much more clear than most of us. He should not generalize blogging pros and cons to every corporation. It's not that clear.



10/19/2004 11:17:52 AM (Eastern Daylight Time, UTC-04:00)  #   
 Monday, October 18, 2004

Kit George gives a neat jump start on number formatting. It's precise, concise, and gives a good global example on the different ways you can format numbers to strings.



10/18/2004 9:29:49 AM (Eastern Daylight Time, UTC-04:00)  #   
 Wednesday, October 13, 2004

Non-Transactional by defaut

The ZIP file format dictates by its nature that the creation of a zip file is a transactional operation. We cumulate a list of files to compress, with all metadata information, and we create the zip file in a single step, compressing each file sequentially, making sure to comply with a storage format that does not leave room for punctual updates. Imagine having to change the contents of a single file within a zip file. You have to rebuild the zip file from the beginning, by copying untouched files' compressed data to a new copy of the zip file, then append the modified file's compressed data, and complete the zip file with the new central directory and ending header.

On the opposite side, changing the contents of a file stored on your hard disk is simple. Each file is accessible randomly, and changing one's contents does not require moving or updating others. Take this for example:

  byte[] mydata = System.Text.Encoding.Default.GetBytes( "This is important!" );
  AbstractFile file = new DiskFile( @"d:\mydata.txt" );

  if( !file.Exists )
    file.Create();

  using( Stream stream = file.OpenWrite( true ) )
  {
    stream.Write( mydata, 0, mydata.Length );
  }

The operation is atomic on the file. The Xceed FileSystem's goal is to mimic this random file access to any possible representation of a file. Thus, exposing compressed files stored in a zip file is no simple task. With the above code, if you replace new DiskFile(...) with new ZippedFile(...), it will work as expected. What you don't see is that only when the stream gets closed will the zip file get rebuilt. All data that you write to the stream is compressed and stored in a temp file, until the last "modify" operation is completed on that zip file. Another example:

  byte[] mydata = System.Text.Encoding.Default.GetBytes( "This is important!" );
  AbstractFile file1 = new DiskFile( @"d:\mydata.txt" );
  AbstractFile file2 = new DiskFile( @"d:\mydatatoo.txt" );
  
  if( !file1.Exists )
    file1.Create();
    
  if( !file2.Exists )
    file2.Create();
    
  using( Stream stream1 = file1.OpenWrite( true ) )
  {
    using( Stream stream2 = file2.OpenWrite( true ) )
    {
      stream1.Write( mydata, 0, mydata.Length );
      stream2.Write( mydata, 0, mydata.Length );
    }
  }

In the atomic world of disk files, both files have no influence on the other. But again, replace DiskFile instances with ZippedFiles, and it's another story. The two files are stored in a zip file, which can only get rebuilt when the last "modify" operation completes, thus when "stream1.Close" is called. Will the above code work? Sure! But the zip file will be rebuilt three times. Try it!

  byte[] mydata = System.Text.Encoding.Default.GetBytes( "This is important!" );
  AbstractFile zipFile = new DiskFile( @"d:\mydatafiles.zip" );
  AbstractFile file1 = new ZippedFile( zipFile, @"\mydata.txt" );
  AbstractFile file2 = new ZippedFile( zipFile, @"\mydatatoo.txt" );

  if( !file1.Exists )
    file1.Create();

  Console.WriteLine( "Check the zip file with WinZip!" );
  Console.WriteLine( "It should contain one empty file named 'mydata.txt'." );
  Console.ReadLine();

  if( !file2.Exists )
    file2.Create();

  Console.WriteLine( "Check the zip file with WinZip!" );
  Console.WriteLine( "It should contain two empty files now." );
  Console.ReadLine();

  using( Stream stream1 = file1.OpenWrite( true ) )
  {
    using( Stream stream2 = file2.OpenWrite( true ) )
    {
      stream1.Write( mydata, 0, mydata.Length );
      stream2.Write( mydata, 0, mydata.Length );
    }

    Console.WriteLine( "Check the zip file with WinZip!" );
    Console.WriteLine( "It still contains two empty files." );
    Console.ReadLine();
  }

  Console.WriteLine( "Check the zip file with WinZip!" );
  Console.WriteLine( "Now it contains both files with their data." );
  Console.ReadLine();

The first call to file1.Create increments the "modify" count to 1, then down to 0, so the zip file is built, containing an empty file. After the second call to Create, the zip file is again rebuilt, containing two empty files. When the first call to OpenWrite is made, the "modify" count gets up to 1. After the second call to OpenWrite, it's up to 2. Then stream2 is closed, and the count gets down to 1. Finally stream1 is closed, the count gets to 0, and the zip file is rebuilt, containing two files with compressed data.

In this simple example, the cost is not that much. Let's imagine worse:

  byte[] mydata = System.Text.Encoding.Default.GetBytes( "This is important!" );
  AbstractFile zipFile = new DiskFile( @"d:\mydatafiles.zip" );

  if( zipFile.Exists )
    zipFile.Delete();

  for( int i=0; i<1000; i++ )
  {
    Console.WriteLine( "Loop {0}", i );
    AbstractFile file = new ZippedFile( zipFile, @"\data" + i.ToString() + ".txt" );

    if( !file.Exists ) 
      file.Create(); 

    using( Stream stream = file.OpenWrite( true ) ) 
    {
      stream.Write( mydata, 0, mydata.Length ); 
    }
  }

If you try this, you'll notice that each loop takes more time than the previous. Actually, when I tried this, I wasn't patient enough to wait until completion. The zip file would get rebuilt 2000 times, with more and more files already in the zip file. This is plainly unacceptable.

Transactional on demand

That's where the IBatchUpdateable interface comes to the rescue. It contains two simple methods: BeginUpdate and EndUpdate. Any AbstractFile or AbstractFolder's derived class can implement this interface, though you can limit this to the root folder. Once BeginUpdate is called, the implementor can hold any modifications to the underlying media until EndUpdate is called. ZipArchive, which represents the root ZippedFolder for a zip file, implement this interface. In short, BeginUpdate artificially increments the "modify" count to 1, and EndUpdate decrements it. If it gets to 0, the underlying zip file is rebuilt. You can call BeginUpdate and EndUpdate as many times as you want, but every call to BeginUpdate must be matched with a call to EndUpdate. The above code could now look like this:

  byte[] mydata = System.Text.Encoding.Default.GetBytes( "This is important!" );
  AbstractFile zipFile = new DiskFile( @"d:\mydatafiles.zip" );
  ZipArchive zip = new ZipArchive( zipFile );

  if( zipFile.Exists )
    zipFile.Delete();

  zip.BeginUpdate();

  try
  {       
    for( int i=0; i<1000; i++ )
    {
      Console.WriteLine( "Loop {0}", i );
      AbstractFile file = new ZippedFile( zipFile, @"\data" + i.ToString() + ".txt" );

      if( !file.Exists ) 
        file.Create(); 

      using( Stream stream = file.OpenWrite( true ) ) 
      {
        stream.Write( mydata, 0, mydata.Length ); 
      }
    }
  }
  finally
  {
    zip.EndUpdate();
  }

Now that's better. On my machine, this takes a few seconds.

The FileSystem's main goal was to offer a unique and consistent interface for manipulating any kind of file or folder. That's why we decided that ZippedFile and ZippedFolder were to be non-transactional by default, even though in most cases, it will end-up producing less efficient code. It's the user's job to call BeginUpdate before modifying the zip file, and EndUpdate once completed, to achieve better performance.

By the way, for those who like the using( IDisposable ) pattern in C#, you can use the AutoBatchUpdate class like this:

  byte[] mydata = System.Text.Encoding.Default.GetBytes( "This is important!" );
  AbstractFile zipFile = new DiskFile( @"d:\mydatafiles.zip" );
  ZipArchive zip = new ZipArchive( zipFile );

  if( zipFile.Exists )
    zipFile.Delete();

  using( AutoBatchUpdate auto = new AutoBatchUpdate( zip ) )
  {
    for( int i=0; i<1000; i++ )
    {
      Console.WriteLine( "Loop {0}", i );
      AbstractFile file = new ZippedFile( zipFile, @"\data" + i.ToString() + ".txt" );

      if( !file.Exists ) 
        file.Create(); 

      using( Stream stream = file.OpenWrite( true ) ) 
      {
        stream.Write( mydata, 0, mydata.Length ); 
      }
    }
  }

The AutoBatchUpdate implements IDisposable, making sure to call BeginUpdate on the object at construction, and EndUpdate when disposed. What's even better is that you can pass any FileSystemItem: it will do nothing if the item's RootFolder does not implement IBatchUpdateable. Thus, you can use AutoBatchUpdate without having to know if the AbstractFile or AbstractFolder you're working with implements IBatchUpdateable or not.

Temp storage

Now, it's good to know that when using BeginUpdate and EndUpdate, the zip file is rebuilt only at the very last moment, but where goes the compressed data I'm writing to the streams? It must be stored somewhere, right? The ZipArchive class exposes two important properties: DefaultTempFolder (static) and TempFolder. By default, the first is equal to new DiskFolder( System.IO.Path.GetTempPath() ), the temp folder of the currently logged-in user. You can assign to it any AbstractFolder, as long as AbstractFile instances created in that folder yield seekable streams (ZippedFile.OpenWrite does not return a seekable stream).

Everytime you create the first instance of a ZipArchive for a given zip file, its TempFolder property is initialized to the value of DefaultTempFolder. Thus, if you assign a folder to the static DefaultTempFolder property, it will apply to all new instances of ZipArchive. If you assign a folder to the TempFolder property, it will only affect ZippedFile, ZippedFolder and ZipArchive instances dealing with that zip file.

If you run the above code while watching your temporary folder using Explorer (hit F5 a few times), you'll see appear and disapear filenames like "XFS330fe108-13b8-4ebb-2299-cace5fa0100a.tmp". Those files are holding the compressed data until the zip file gets rebuilt. Most serious zip libraries allow to use memory instead of a disk folder while zipping. For example, the Xceed Zip ActiveX exposes the UseTempFile property. When set to false, the library stores temp data in memory while building the zip file. With Xceed Zip for .NET, you achieve this by setting ZipArchive.DefaultTempFolder to new MemoryFolder(). Voilà! You are storing temporary data in memory. This is very useful for ASP.NET applications that cannot write on disk. And even better: it also works when updating existing zip files. But watch out! Don't zip gigabytes of files while using a MemoryFolder. There is a time for a MemoryFolder, and there is a time for a DiskFolder.



10/13/2004 12:18:38 PM (Eastern Daylight Time, UTC-04:00)  #