Tuesday, January 31, 2006

Ever since I've been working with the .NET framework, most of my time was spent on the System.IO namespace. I'm not a UI guy, I'm an IO guy! The most important class in that namespace is System.IO.Stream. And since it was well-designed, probably inspired by other successful stream implementations (Delphi comes to mind), it's very easy to expose features using streams.

My favorite use of streams is for pass-through streams. A pass-through stream is a class which derives from System.IO.Stream, but reads from or writes to an inner stream received at creation. It serves as a data modifyer or data analyser. When reading from a pass-through stream, it first reads from its inner stream, then processes the data read (potentially modifying it) and returns this data. When writing to a pass-through stream, it first processes the provided data (again potentially modifying it), then writes it to its inner stream.

Xceed Zip for .NET and Xceed FTP for .NET both use a pletoria of pass-through streams. The most popular is Xceed.Compression.CompressedStream, the stream responsible for compressing data before writing it to its inner stream, or decompressing data read from its inner stream. But most others are internal. We've been juggling with the idea of exposing them for a long time, but beleive it would only confuse developers to "see" those new namespaces and classes. Another useful thing with internal classes is that we can change their interface without causing breaking changes.

TransientStream

It was a long debate before we decided to go forth with the "transient" keyword. Not only is it used in the TransientStream type name, but also as a property on many of our pass-through streams. What we meant by "transient" is "volatile", or if you prefer more explicit keywords, "does-not-close-its-inner-stream-when-closed". A TransientStream is about the simplest expression of a pass-through stream. All required property and method overrides simply call the inner stream. The only exception is for the Close method, which simply makes sure not to call Close on the inner stream. This is very useful when you need to pass your stream to another routine which closes the stream, while you don't want your stream to get closed.

ChecksumStream

This stream does not modify the data read from or written to, but takes the opportunity to calculate either a CRC32 or an Adler32 on that data. When reading, it can also make sure, upon closing it, that the calculated checksum matches an expected stream, else throw an exception. In this way, we can insert checksum calculation anywhere in a process without interfering nor requiring code changes.

CombinedStream

The deflate compression algorithm has the ability to detect the end of the data when decompressing. The CompressedStream is itself a pass-through stream. When reading from it, it first reads from the inner stream, then decompresses the data. When it reaches the end of the compressed data, the CompressedStream has the ability to return a stream on the remaining data, in case this inner stream contains more data after the compressed block. Why isn't this equivalent to the inner stream you might ask? Let's say the inner stream isn't seekable. The CompressedStream's Read method first reads N bytes from the inner stream, but may have found that the end of the compressed data is after M bytes (M < N). The inner stream is already N-M bytes too far. The CombinedStream receives both a byte array (the unused N-M bytes) and the inner stream as ctor parameters, and will expose those as one contiguous stream. Pretty slick!

HeaderFooterStream

Xceed Streaming Compression for .NET exposes stream-based (as opposed to archive-based) compression formats. Those formats all have one thing in common: they have a header and a footer. Not all of them can depend on the deflate algorithm to automatically detect the end of the stream. That's why they need to make sure to never return the first M bytes and last N bytes from the inner stream, where M is the expected header size and N the expected footer size.

WindowStream

When exposing part of a zip file as a single AbstractFile, we need to make sure we do not read past the boundaries of that file's data in the zip file. The WindowStream exposes a region of its inner stream as a zero-position, N-length stream.

ZCryptStream

This pass-though stream automatically encrypts or decrypts the data written or read, using the basic Zip encryption (which is as weak as me in front of a cheese cake). I will be working on AES encryption very soon, and it will most probably be implemented as a pass-through stream too!

NotifyStream

Though pass-through streams can do much of the task, it is often better for the clarity of the code to have processing done by other classes not deriving from System.IO.Stream. The NotifyStream class exposes three events: ReadingFromStream, WritingToStream and ClosingStream. Any other class can advise for those events to intervene in the reading or writing process. This old class exists since the beginning of Xceed Zip for .NET, but it has proven very useful in the current development we are doing for Tar and GZip support within Xceed Zip for .NET.

ForwardSeekableStream

This new class created for Xceed Zip for .NET 3.0 (Tar and GZip support) can expose a non-seekable stream as a seekable stream when reading, or at least a stream reporting a Position when writing. When reading, you can call Seek with an offset behond the current position, and it will simply read from the non-seekable inner stream until well positioned. And for both reading and writing operations, it counts the number of bytes read or written so it can report a position (granting we knew the original position when created).

FtpAsciiDataStream

Xceed FTP for .NET also uses pass-through streams. For example, the FtpAsciiDataStream wraps the NetworkStream to perform convertion of LF to CR/LF on the fly when sending a file in ASCII mode.


.NET | Compression | FileSystem | FTP | Zip

1/31/2006 9:47:29 AM (Eastern Standard Time, UTC-05:00)  #   
 Tuesday, January 17, 2006

Found this from Scott... Hey! It was my idea!


Fun | General

1/17/2006 9:46:11 AM (Eastern Standard Time, UTC-05:00)  #   

One of the less known features of the Xceed FileSystem is its file filtering capabilities. Not only does it come with built-in support for filtering files based on name, size, attributes and dates, it also lets you easily combine criterias. Furthermore, as for all Xceed components, it's fully extensible.

For example, let's say I want to copy files matching the "*.txt" filter that have the archive attribute on. The following code can be used:

  sourceFolder.CopyFilesTo( destFolder, true, true, "*.txt", FileAttributes.Archive );

What is happening beneath the surface? The fouth parameter is "params object[] filters". This means you can provide any number of any types of parameters. Any types? Not exactly. What you should see is "params Filter[] filters". The Filter class is the base class for any type of filter you could think of. The Xceed FileSystem comes with seven built-in filter classes, divided in two categories:

Operators: AndFilter, OrFilter, NotFilter.
Filters: NameFilter, AttributeFilter, SizeFilter, DateTimeFilter.

So the line of code above can be seen as this:

  sourceFolder.CopyFilesTo( destFolder, true, true,
    new AndFilter( new NameFilter( "*.txt" ), new AttributeFilter( FileAttributes.Archive ) ) );

But we've decided that forcing the creation of a new NameFilter everytime you want to filter on a mask was overkill for such a common operation. That's why we also accept two other types of parameters. Strings are automatically converted to a NameFilter, and FileAttributes are automatically converted to an AttributeFilter. Finally, providing two or more filters as separate parameters automatically puts them in an AndFilter.

But then, what happens to another common scenario: filtering files based on two name filters? Passing "*.txt" as the fourth parameter, and "*.doc" as the fifth would generate an AndFilter around them, thus only matching files that match the ".txt" and the ".doc" extensions... Oups!

We support yet another exception: any string filter can contain a pipe character (|) for providing multiple name filters that will be grouped in an OrFilter, like this:

  sourceFolder.CopyFilesTo( destFolder, true, true, "*.txt|*.doc" );

This will automatically be translated to:

  sourceFolder.CopyFilesTo( destFolder, true, true, 
    new OrFilter( new NameFilter( "*.txt" ), new NameFilter( "*.doc" ) ) );

By the way, most operator-like filters' constructors will accept strings and FileAttributes too, doing the translation to NameFilter and AttributeFilter instances for you.

The final "hidden" feature relates to case sensitivity. By default, the FileSystem is case insensitive, as is the Windows platform. But since archives like zip files may come from other planets like Linux or Mac OS X, we wanted to support case-sensitive file matching. If you prepend your string with the "greater than" character (<), the resulting NameFilter will be case-sensitive. The following code will only match files which have their extension in upper-case:

  sourceFolder.CopyFilesTo( destFolder, true, true, ">*.TXT" );

Since Windows does remember the casing of filenames, this can be very useful even on the Windows platform. Furthermore, since we released the library, the Mono project came to life, and our library can now be used on other platforms.

Extending filters

You can easily create custom filters by deriving from the Xceed.FileSystem.Filter class and overriding the IsItemMatching method. A SearchFilter class, which searches for a particular text within files could look like this:

  class SearchFilter : Filter
  {
    public SearchFilter( string text )
      : base( FilterScope.File )
    {
      if( text == null )
        throw new ArgumentNullException( "text" );

      if( text.Length == 0 )
        throw new ArgumentException( "The text cannot be empty.", "text" );

      m_text = text;
    }

    public override bool IsItemMatching( FileSystemItem item )
    {
      AbstractFile file = item as AbstractFile;

      if( file == null )
        return false;

      try
      {
        int bufferSize = ( file.Size < 0x1000000 )
          ? unchecked( ( int )file.Size )
          : 0x1000000;

        byte[] search = System.Text.Encoding.Default.GetBytes( m_text );

        if( search.Length <= bufferSize )
        {
          byte[] buffer = new byte[ bufferSize ];
          int found = 0;

          using( BinaryReader reader = new BinaryReader( file.OpenRead( FileShare.ReadWrite ) ) )
          {
            int read = 0;

            while( ( read = reader.Read( buffer, 0, bufferSize ) ) > 0 )
            {
              found = FindBuffer( buffer, 0, read, search, found );

              if( found == search.Length )
                return true;
            }
          }
        }     
      }
      catch {}

      return false;
    }

    private int FindBuffer(
      byte[] source,
      int sourceStart,
      int sourceCount,
      byte[] search,
      int searchIndex )
    {
      // TODO: Param check!

      for( int i=0; i<sourceCount; i++ )
      {
        if( source[ sourceStart + i ] == search[ searchIndex ] )
        {
          if( ++searchIndex == search.Length )
            return searchIndex;
        }
        else
        {
          searchIndex = 0;
        }
      }

      return searchIndex;
    }

    private string m_text; // = null
  }

Using this custom filter, you can now copy only files that contain a particular text:

  sourceFolder.CopyFilesTo( destFolder, true, true, new SearchFilter( "allo" ) );

Conditionally recursing

One missing feature we had with the filtering will be addressed with today's release. There were no way to control which subfolders to recurse into or not when calling methods accepting filters (CopyFilesTo, MoveFilesTo, GetFiles, GetFolders). The FilterScope.Folder value wasn't preventing recursing into subfolders. It was only meant to include or exclude folder entries from being copied. But passing "true" or "false" as the "recurse" parameter was an all or nothing deal.

Today, we introduce a new scope: FilterScope.Recurse. It does not interfere with the File or Folder socpe, and only determines if we should continue matching files into each subfolder. Its number one use is for excluding subfolders:

  sourceFolder.CopyFilesTo( destFolder, true, true, 
    "*.txt", new NotFilter( new NameFilter( "Bar", FilterScope.Recurse ) ) );

The way you combine "Recurse" filters and other filters is irrelevant. When deciding to copy files or folders, the library ignores any filters with the Recurse scope. When deciding to call itself recursively, the library ignores any filters with the File or Folder scope.

We plan on providing new types of filters. Suggestions welcomed!



1/17/2006 9:35:07 AM (Eastern Standard Time, UTC-05:00)  #   
 Wednesday, January 11, 2006

First, I wish all my readers health and happyness for the new year.

Now, let's jump into the subject of the day: Scott Hanselman's HanselMinutes. I'm currently listening to his first podcast. I've never been a real fan of podcasts, but since Scott Hanselman is about my number 1 blogger, I could not miss this event.

Hmmm, how can I express my feelings about podcasting without hurting Scott's feelings? Is it me, or are computer subjects not fit for audio? I want links! I want screenshots! I want examples! I want immediate access to extended information upon my needs! With a podcast, I'm stuck listening to all the stream. Sure, I can fast forward, but you end-up playing the "find that show you recorded" game you play with your VCR. Worse, you don't know what you're looking for. You are at the mercy of the podcaster. You can't filter, you can't opt in or out of a subject.

Maybe I'm not listening podcasts at the proper moment? Maybe I'm trying to use podcasts as if they were audio blogs, which they are not? I tried listening to a podcast in my car on the way to work, just to discover I was sad missing the local news and forecasts I usually listen to in the morning. I tried listening to a podcast at home in the evening when I push my computer geekness to its limits by moving back to a computer, but I generally need to disconnect from work, and I prefer playing Guild Wars! I tried listening to a podcast in bed before getting to sleep, just to find out I prefer doing other things in bed... like sleeping... and... ok, you get the picture!

The funny part is that I've been approached by the Visual Studio Talk Show for a 45 minutes podcast-style interview in French, and I've said yes. But it's only a one time deal. Even though this show is mostly accessible as a podcast, I see this as an interview, and no way I could maintain a weekly podcast.

So I'll conclude with Scott's own words: podcasting sucks. It wastes my precious time. I would have liked it very much if Carl Franklin would have asked Scott about his background, his developer path, about himself. I want to know more about Scott. For links, I'll continue reading his blog.

Oh, and one more thing: the damn advertising is barely tolerable.



1/11/2006 9:09:26 AM (Eastern Standard Time, UTC-05:00)  #