Thursday, February 09, 2006

Wonder what's the formula behind Xceed's build numbers? Here's the secret recipe:

( Year - 2000 ) * 1000 + ( Month * 50 ) + ( Day )

Heck, we even made ourselves an Xceed Version Yahoo! Widget!!!

Update: Until I learn how to open the ".widget" extension for downloading in dasBlog, I renamed the file to "Xceed Version.widget.zip". Just rename to "Xceed Version.widget" once downloaded.


Fun

2/9/2006 3:24:57 PM (Eastern Standard Time, UTC-05:00)  #   
 Tuesday, January 31, 2006

Ever since I've been working with the .NET framework, most of my time was spent on the System.IO namespace. I'm not a UI guy, I'm an IO guy! The most important class in that namespace is System.IO.Stream. And since it was well-designed, probably inspired by other successful stream implementations (Delphi comes to mind), it's very easy to expose features using streams.

My favorite use of streams is for pass-through streams. A pass-through stream is a class which derives from System.IO.Stream, but reads from or writes to an inner stream received at creation. It serves as a data modifyer or data analyser. When reading from a pass-through stream, it first reads from its inner stream, then processes the data read (potentially modifying it) and returns this data. When writing to a pass-through stream, it first processes the provided data (again potentially modifying it), then writes it to its inner stream.

Xceed Zip for .NET and Xceed FTP for .NET both use a pletoria of pass-through streams. The most popular is Xceed.Compression.CompressedStream, the stream responsible for compressing data before writing it to its inner stream, or decompressing data read from its inner stream. But most others are internal. We've been juggling with the idea of exposing them for a long time, but beleive it would only confuse developers to "see" those new namespaces and classes. Another useful thing with internal classes is that we can change their interface without causing breaking changes.

TransientStream

It was a long debate before we decided to go forth with the "transient" keyword. Not only is it used in the TransientStream type name, but also as a property on many of our pass-through streams. What we meant by "transient" is "volatile", or if you prefer more explicit keywords, "does-not-close-its-inner-stream-when-closed". A TransientStream is about the simplest expression of a pass-through stream. All required property and method overrides simply call the inner stream. The only exception is for the Close method, which simply makes sure not to call Close on the inner stream. This is very useful when you need to pass your stream to another routine which closes the stream, while you don't want your stream to get closed.

ChecksumStream

This stream does not modify the data read from or written to, but takes the opportunity to calculate either a CRC32 or an Adler32 on that data. When reading, it can also make sure, upon closing it, that the calculated checksum matches an expected stream, else throw an exception. In this way, we can insert checksum calculation anywhere in a process without interfering nor requiring code changes.

CombinedStream

The deflate compression algorithm has the ability to detect the end of the data when decompressing. The CompressedStream is itself a pass-through stream. When reading from it, it first reads from the inner stream, then decompresses the data. When it reaches the end of the compressed data, the CompressedStream has the ability to return a stream on the remaining data, in case this inner stream contains more data after the compressed block. Why isn't this equivalent to the inner stream you might ask? Let's say the inner stream isn't seekable. The CompressedStream's Read method first reads N bytes from the inner stream, but may have found that the end of the compressed data is after M bytes (M < N). The inner stream is already N-M bytes too far. The CombinedStream receives both a byte array (the unused N-M bytes) and the inner stream as ctor parameters, and will expose those as one contiguous stream. Pretty slick!

HeaderFooterStream

Xceed Streaming Compression for .NET exposes stream-based (as opposed to archive-based) compression formats. Those formats all have one thing in common: they have a header and a footer. Not all of them can depend on the deflate algorithm to automatically detect the end of the stream. That's why they need to make sure to never return the first M bytes and last N bytes from the inner stream, where M is the expected header size and N the expected footer size.

WindowStream

When exposing part of a zip file as a single AbstractFile, we need to make sure we do not read past the boundaries of that file's data in the zip file. The WindowStream exposes a region of its inner stream as a zero-position, N-length stream.

ZCryptStream

This pass-though stream automatically encrypts or decrypts the data written or read, using the basic Zip encryption (which is as weak as me in front of a cheese cake). I will be working on AES encryption very soon, and it will most probably be implemented as a pass-through stream too!

NotifyStream

Though pass-through streams can do much of the task, it is often better for the clarity of the code to have processing done by other classes not deriving from System.IO.Stream. The NotifyStream class exposes three events: ReadingFromStream, WritingToStream and ClosingStream. Any other class can advise for those events to intervene in the reading or writing process. This old class exists since the beginning of Xceed Zip for .NET, but it has proven very useful in the current development we are doing for Tar and GZip support within Xceed Zip for .NET.

ForwardSeekableStream

This new class created for Xceed Zip for .NET 3.0 (Tar and GZip support) can expose a non-seekable stream as a seekable stream when reading, or at least a stream reporting a Position when writing. When reading, you can call Seek with an offset behond the current position, and it will simply read from the non-seekable inner stream until well positioned. And for both reading and writing operations, it counts the number of bytes read or written so it can report a position (granting we knew the original position when created).

FtpAsciiDataStream

Xceed FTP for .NET also uses pass-through streams. For example, the FtpAsciiDataStream wraps the NetworkStream to perform convertion of LF to CR/LF on the fly when sending a file in ASCII mode.


.NET | Compression | FileSystem | FTP | Zip

1/31/2006 9:47:29 AM (Eastern Standard Time, UTC-05:00)  #   
 Tuesday, January 17, 2006

Found this from Scott... Hey! It was my idea!


Fun | General

1/17/2006 9:46:11 AM (Eastern Standard Time, UTC-05:00)  #   

One of the less known features of the Xceed FileSystem is its file filtering capabilities. Not only does it come with built-in support for filtering files based on name, size, attributes and dates, it also lets you easily combine criterias. Furthermore, as for all Xceed components, it's fully extensible.

For example, let's say I want to copy files matching the "*.txt" filter that have the archive attribute on. The following code can be used:

  sourceFolder.CopyFilesTo( destFolder, true, true, "*.txt", FileAttributes.Archive );

What is happening beneath the surface? The fouth parameter is "params object[] filters". This means you can provide any number of any types of parameters. Any types? Not exactly. What you should see is "params Filter[] filters". The Filter class is the base class for any type of filter you could think of. The Xceed FileSystem comes with seven built-in filter classes, divided in two categories:

Operators: AndFilter, OrFilter, NotFilter.
Filters: NameFilter, AttributeFilter, SizeFilter, DateTimeFilter.

So the line of code above can be seen as this:

  sourceFolder.CopyFilesTo( destFolder, true, true,
    new AndFilter( new NameFilter( "*.txt" ), new AttributeFilter( FileAttributes.Archive ) ) );

But we've decided that forcing the creation of a new NameFilter everytime you want to filter on a mask was overkill for such a common operation. That's why we also accept two other types of parameters. Strings are automatically converted to a NameFilter, and FileAttributes are automatically converted to an AttributeFilter. Finally, providing two or more filters as separate parameters automatically puts them in an AndFilter.

But then, what happens to another common scenario: filtering files based on two name filters? Passing "*.txt" as the fourth parameter, and "*.doc" as the fifth would generate an AndFilter around them, thus only matching files that match the ".txt" and the ".doc" extensions... Oups!

We support yet another exception: any string filter can contain a pipe character (|) for providing multiple name filters that will be grouped in an OrFilter, like this:

  sourceFolder.CopyFilesTo( destFolder, true, true, "*.txt|*.doc" );

This will automatically be translated to:

  sourceFolder.CopyFilesTo( destFolder, true, true, 
    new OrFilter( new NameFilter( "*.txt" ), new NameFilter( "*.doc" ) ) );

By the way, most operator-like filters' constructors will accept strings and FileAttributes too, doing the translation to NameFilter and AttributeFilter instances for you.

The final "hidden" feature relates to case sensitivity. By default, the FileSystem is case insensitive, as is the Windows platform. But since archives like zip files may come from other planets like Linux or Mac OS X, we wanted to support case-sensitive file matching. If you prepend your string with the "greater than" character (<), the resulting NameFilter will be case-sensitive. The following code will only match files which have their extension in upper-case:

  sourceFolder.CopyFilesTo( destFolder, true, true, ">*.TXT" );

Since Windows does remember the casing of filenames, this can be very useful even on the Windows platform. Furthermore, since we released the library, the Mono project came to life, and our library can now be used on other platforms.

Extending filters

You can easily create custom filters by deriving from the Xceed.FileSystem.Filter class and overriding the IsItemMatching method. A SearchFilter class, which searches for a particular text within files could look like this:

  class SearchFilter : Filter
  {
    public SearchFilter( string text )
      : base( FilterScope.File )
    {
      if( text == null )
        throw new ArgumentNullException( "text" );

      if( text.Length == 0 )
        throw new ArgumentException( "The text cannot be empty.", "text" );

      m_text = text;
    }

    public override bool IsItemMatching( FileSystemItem item )
    {
      AbstractFile file = item as AbstractFile;

      if( file == null )
        return false;

      try
      {
        int bufferSize = ( file.Size < 0x1000000 )
          ? unchecked( ( int )file.Size )
          : 0x1000000;

        byte[] search = System.Text.Encoding.Default.GetBytes( m_text );

        if( search.Length <= bufferSize )
        {
          byte[] buffer = new byte[ bufferSize ];
          int found = 0;

          using( BinaryReader reader = new BinaryReader( file.OpenRead( FileShare.ReadWrite ) ) )
          {
            int read = 0;

            while( ( read = reader.Read( buffer, 0, bufferSize ) ) > 0 )
            {
              found = FindBuffer( buffer, 0, read, search, found );

              if( found == search.Length )
                return true;
            }
          }
        }     
      }
      catch {}

      return false;
    }

    private int FindBuffer(
      byte[] source,
      int sourceStart,
      int sourceCount,
      byte[] search,
      int searchIndex )
    {
      // TODO: Param check!

      for( int i=0; i<sourceCount; i++ )
      {
        if( source[ sourceStart + i ] == search[ searchIndex ] )
        {
          if( ++searchIndex == search.Length )
            return searchIndex;
        }
        else
        {
          searchIndex = 0;
        }
      }

      return searchIndex;
    }

    private string m_text; // = null
  }

Using this custom filter, you can now copy only files that contain a particular text:

  sourceFolder.CopyFilesTo( destFolder, true, true, new SearchFilter( "allo" ) );

Conditionally recursing

One missing feature we had with the filtering will be addressed with today's release. There were no way to control which subfolders to recurse into or not when calling methods accepting filters (CopyFilesTo, MoveFilesTo, GetFiles, GetFolders). The FilterScope.Folder value wasn't preventing recursing into subfolders. It was only meant to include or exclude folder entries from being copied. But passing "true" or "false" as the "recurse" parameter was an all or nothing deal.

Today, we introduce a new scope: FilterScope.Recurse. It does not interfere with the File or Folder socpe, and only determines if we should continue matching files into each subfolder. Its number one use is for excluding subfolders:

  sourceFolder.CopyFilesTo( destFolder, true, true, 
    "*.txt", new NotFilter( new NameFilter( "Bar", FilterScope.Recurse ) ) );

The way you combine "Recurse" filters and other filters is irrelevant. When deciding to copy files or folders, the library ignores any filters with the Recurse scope. When deciding to call itself recursively, the library ignores any filters with the File or Folder scope.

We plan on providing new types of filters. Suggestions welcomed!



1/17/2006 9:35:07 AM (Eastern Standard Time, UTC-05:00)  #   
 Wednesday, January 11, 2006

First, I wish all my readers health and happyness for the new year.

Now, let's jump into the subject of the day: Scott Hanselman's HanselMinutes. I'm currently listening to his first podcast. I've never been a real fan of podcasts, but since Scott Hanselman is about my number 1 blogger, I could not miss this event.

Hmmm, how can I express my feelings about podcasting without hurting Scott's feelings? Is it me, or are computer subjects not fit for audio? I want links! I want screenshots! I want examples! I want immediate access to extended information upon my needs! With a podcast, I'm stuck listening to all the stream. Sure, I can fast forward, but you end-up playing the "find that show you recorded" game you play with your VCR. Worse, you don't know what you're looking for. You are at the mercy of the podcaster. You can't filter, you can't opt in or out of a subject.

Maybe I'm not listening podcasts at the proper moment? Maybe I'm trying to use podcasts as if they were audio blogs, which they are not? I tried listening to a podcast in my car on the way to work, just to discover I was sad missing the local news and forecasts I usually listen to in the morning. I tried listening to a podcast at home in the evening when I push my computer geekness to its limits by moving back to a computer, but I generally need to disconnect from work, and I prefer playing Guild Wars! I tried listening to a podcast in bed before getting to sleep, just to find out I prefer doing other things in bed... like sleeping... and... ok, you get the picture!

The funny part is that I've been approached by the Visual Studio Talk Show for a 45 minutes podcast-style interview in French, and I've said yes. But it's only a one time deal. Even though this show is mostly accessible as a podcast, I see this as an interview, and no way I could maintain a weekly podcast.

So I'll conclude with Scott's own words: podcasting sucks. It wastes my precious time. I would have liked it very much if Carl Franklin would have asked Scott about his background, his developer path, about himself. I want to know more about Scott. For links, I'll continue reading his blog.

Oh, and one more thing: the damn advertising is barely tolerable.



1/11/2006 9:09:26 AM (Eastern Standard Time, UTC-05:00)  #   
 Thursday, November 24, 2005

I must admit, I'm no database wizz. It's been a looooong time since I've played with SQL, and I never really digged into System.Data. The rare moments I required a data connection, the design-time experience was enough, oh, and that Fill call on that adapter! q;-)

Recently, a customer explained he was using an SqlDataReader to fetch its data on demand, to avoid loading too much. One of the fields was a byte array containing a GZip file. He was using the static method GZipCompressedStream.Decompress to get the data in that GZip file. Unfortunately, the data was sometimes quite large, and this technique prevented him from using SqlDataReader.GetBytes (exposed in the IDataRecord interface), which allows to read only chunks of a field at a time.

Bummer... My "Streams everywhere" modo was challenged. You see, Xceed Streaming Compression for .NET allows you to either decompress a single byte array in one operation (stateless), or wrap a GZipCompressedStream around your source stream and read from it to decompress data on the fly. But in this case, no streams. Nada. Or if there is one, I didn't find it.

I was not going to get defeated by that mere absence. The "Streaming" in "Xceed Streaming Compression for .NET" is exactly about that scenario. It turns out it was quite easy to overcome this little problem. I created myself a DataRecordStream class, which derives from System.IO.Stream.

Apart from the usual overrides required when deriving from System.IO.Stream, I expose a constructor requiring an IDataRecord parameter and the index of the field to expose as a stream.

    public DataRecordStream( IDataRecord record, int fieldIndex )
{
if( record == null )
throw new ArgumentNullException( "record" );

if( ( fieldIndex < 0 ) || ( fieldIndex >= record.FieldCount ) )
throw new ArgumentOutOfRangeException( "fieldIndex", fieldIndex, "Invalid field index." );

m_record = record;
m_field = fieldIndex;
}

Then, when Read is called, I simply turn to my IDataRecord's GetBytes method to fill that buffer.

    public override int Read( byte[] buffer, int offset, int count )
{
long read = m_record.GetBytes( m_field, m_position, buffer, offset, count );

m_position += read;

if( ( m_length != -1 ) && ( m_position > m_length ) )
{
// The reported length was smaller than the actual size.
// We update the length dynamically.
m_length = m_position;
}

return unchecked( ( int )read );
}

The rest is just glue for managing the position and allowing seeking. The good part about this new class is that you can now wrap any pass-thru stream around it, for example a GZipCompressedStream. My customer can now read text compressed in a GZip file stored in one of its database fields quite easily, without consuming too much memory.

      SqlConnection connection = new SqlConnection( 
"integrated security=SSPI;data source=xxx;initial catalog=GZipTest" );

connection.Open();

try
{
using( SqlCommand command = new SqlCommand( "SELECT * FROM GZipTestTable", connection ) )
{
using( SqlDataReader dataReader = command.ExecuteReader() )
{
while( dataReader.Read() )
{
using( StreamReader textReader = new StreamReader(
new GZipCompressedStream(
new DataRecordStream(
dataReader, dataReader.GetOrdinal( "GZipField" ) ) ) ) )
{
string line;

while( ( line = textReader.ReadLine() ) != null )
{
Console.WriteLine( line );
}
}
}
}
}
}
finally
{
connection.Close();
}

I have made a VB.NET version of the class too. Enjoy!

DataRecordStream.zip (3.34 KB)



11/24/2005 9:25:04 AM (Eastern Standard Time, UTC-05:00)  #   
 Friday, November 11, 2005

I previously gave a glimpse of how to zip into an HttpResponse's OutputStream, but it wasn't explaining all aspects of zipping from ASP.NET. So I'll get in more details here.

First, I have used my fantastic talent in UI designs to create this web page:

Yup, three checkboxes and a button is enough gadgets for me!

The first piece of code involves Application_Start. Since I know I won't be zipping gazillions of bytes, I want my web page to use memory as a temporary location for compressed data. How you do this with Xceed Zip for .NET is simple: You create a RAM drive! Oh the good old days of RAM drives...

    protected void Application_Start(Object sender, EventArgs e)
    {
      Xceed.Zip.Licenser.LicenseKey = "ZIN23-#####-#####-####";
      ZipArchive.DefaultTempFolder = new MemoryFolder();
    }

This new MemoryFolder is acting exactly like a per-process RAM drive. It's an AbstractFolder like any other AbstractFolder. The TempFolder of all new ZipArchive instances will be initialized to that value. Application_Start is also a great place where to set your license key, before anything else.

We're now ready for the button's click event. Again, I want to avoid write access on the hard drive, and wish to zip directly in the response stream. But the idea behind the Xceed FileSystem is to copy source files and folders to destination files and folders. How can I zip into a Stream? The StreamFile class comes to the rescue. It lets you expose a Stream as if it were an AbstractFile. Then, you can pass this StreamFile to the ZipArchive's constructor, to tell that new instance to write into that Stream. The rest is glue code for my wonderful ASP.NET application to zip the correct files.

    private void Button1_Click(object sender, System.EventArgs e)
    {
      if( !CheckBox1.Checked && !CheckBox2.Checked && !CheckBox3.Checked )
      {
        // Redirect to error page...
        return;
      }

      // The "MACHINE\ASP_NET" user must have read access to that folder.
      DiskFolder source = new DiskFolder( @"d:\" );

      // We want the client-side to recognize the upcoming file as a zip file.
      this.Response.ContentType = "application/zip";
      this.Response.AddHeader( "Content-Disposition", "attachment; filename=YourFiles.zip" );

      // We will zip directly in the response stream. The temporary compressed
      // data will be written to the ZipArchive's TempFolder, thus the MemoryFolder 
      // we set in Application_Start.
      ZipArchive destination = new ZipArchive( new StreamFile( this.Response.OutputStream ) );

      // And finally we zip in a single operation. If we had to zip more than
      // one source, we could have used ZipArchive.BeginUpdate/EndUpdate.
      ArrayList nameFilters = new ArrayList();

      if( CheckBox1.Checked )
        nameFilters.Add( new NameFilter( "*.txt" ) );

      if( CheckBox2.Checked )
        nameFilters.Add( new NameFilter( "*.jpg" ) );

      if( CheckBox3.Checked )
        nameFilters.Add( new NameFilter( "*.exe|*.dll" ) );

      // Passing more than one filter to CopyFilesTo does an "AndFilter"
      // by default.
      Filter mainFilter = ( nameFilters.Count == 1 )
        ? nameFilters[ 0 ] as Filter
        : new OrFilter( nameFilters.ToArray( typeof( NameFilter ) ) );

      source.CopyFilesTo( destination, false, true, mainFilter );

      this.Response.End();
    }

We now have an ASP.NET application which only requires read access to the source files and folders to zip. Everything else is done in memory, without drifting away from the logic of the Xceed FileSystem; manipulating files and folders.


.NET | Zip

11/11/2005 9:26:28 AM (Eastern Standard Time, UTC-05:00)  #   
 Tuesday, November 01, 2005

Just in case my previous post on the subject did not ring a bell, the release of version 2.1 of Xceed FTP for .NET means you can directly unzip from a zip file located on an FTP server, without downloading the file first! Look at the following code:

  using( FtpConnection connection = new FtpConnection( "ftp.xceed.com" ) )
  {
    FtpFile source = new FtpFile( connection, @"/images/Flowers/Backup/Flowers.zip" );
    DiskFolder dest = new DiskFolder( @"d:\temp\flowers" );

    ZipArchive zip = new ZipArchive( source );
    zip.CopyFilesTo( dest, true, true );
  }

The secret behind this code is the kind of stream "FtpFile.OpenRead" returns. Though we are dealing with a network connection, this stream is fully seekable! The FtpFile takes advantage of the "REST" FTP command, which tells the FTP server we wish to start the transfer at a specific offset. Thus, when the ZipArchive needs to seek at the end of the file to locate the ending header, a proper "REST" command is issued to avoid having to read all the zip file first. And the same happens when reading the central directory, or unzipping specific files.


.NET | FileSystem | FTP | Zip

11/1/2005 4:15:40 PM (Eastern Daylight Time, UTC-04:00)  #   
 Friday, October 28, 2005

I have put forward my incredible talent with ASCII art (sic) and updated my wonderful "FileSystem.txt" file, which describes classes available with the Xceed FileSystem for .NET.

                             ==============
                             FileSystemItem
                             ==============
                                    |
                     +--------------+------------------+
                     |                                 |
              ==============                     ============
              AbstractFolder                     AbstractFile
              ==============                     ============
                     |                                 |
         +---+---+---+---+---+---+     +---+---+---+---+---+---+---+
         |   |   |   |   |   |   |     |   |   |   |   |   |   |   |
 ==========  |   |   |   |   |   |     |   |   |   |   |   |   |  ========
 DiskFolder  |   |   |   |   |   |     |   |   |   |   |   |   |  DiskFile
 --============  |   |   |   |   |     |   |   |   |   |   |  ==========--
   MemoryFolder  |   |   |   |   |     |   |   |   |   |   |  MemoryFile
   --==============  |   |   |   |     |   |   |   |   |  ============--
     IsolatedFolder  |   |   |   |     |   |   |   |   |  IsolatedFile
     ------============  |   |   |     |   |   |   |  ============----
           ZippedFolder  |   |   |     |   |   |   |  ZippedFile
           -------=========  |   |     |   |   |  =======-------
                  FtpFolder  |   |     |   |   |  FtpFile
                  -============  |     |   |  ==========-
                   TarredFolder  |     |   |  TarredFile
                   ---=============    |  ===========---
                      GZippedFolder    |  GZippedFile
                      -------------   ==========-----
                                      StreamFile
                                      ----------

Can you guess what I'm working on?



10/28/2005 3:36:45 PM (Eastern Daylight Time, UTC-04:00)  #