Tuesday, January 31, 2006

Ever since I've been working with the .NET framework, most of my time was spent on the System.IO namespace. I'm not a UI guy, I'm an IO guy! The most important class in that namespace is System.IO.Stream. And since it was well-designed, probably inspired by other successful stream implementations (Delphi comes to mind), it's very easy to expose features using streams.

My favorite use of streams is for pass-through streams. A pass-through stream is a class which derives from System.IO.Stream, but reads from or writes to an inner stream received at creation. It serves as a data modifyer or data analyser. When reading from a pass-through stream, it first reads from its inner stream, then processes the data read (potentially modifying it) and returns this data. When writing to a pass-through stream, it first processes the provided data (again potentially modifying it), then writes it to its inner stream.

Xceed Zip for .NET and Xceed FTP for .NET both use a pletoria of pass-through streams. The most popular is Xceed.Compression.CompressedStream, the stream responsible for compressing data before writing it to its inner stream, or decompressing data read from its inner stream. But most others are internal. We've been juggling with the idea of exposing them for a long time, but beleive it would only confuse developers to "see" those new namespaces and classes. Another useful thing with internal classes is that we can change their interface without causing breaking changes.

TransientStream

It was a long debate before we decided to go forth with the "transient" keyword. Not only is it used in the TransientStream type name, but also as a property on many of our pass-through streams. What we meant by "transient" is "volatile", or if you prefer more explicit keywords, "does-not-close-its-inner-stream-when-closed". A TransientStream is about the simplest expression of a pass-through stream. All required property and method overrides simply call the inner stream. The only exception is for the Close method, which simply makes sure not to call Close on the inner stream. This is very useful when you need to pass your stream to another routine which closes the stream, while you don't want your stream to get closed.

ChecksumStream

This stream does not modify the data read from or written to, but takes the opportunity to calculate either a CRC32 or an Adler32 on that data. When reading, it can also make sure, upon closing it, that the calculated checksum matches an expected stream, else throw an exception. In this way, we can insert checksum calculation anywhere in a process without interfering nor requiring code changes.

CombinedStream

The deflate compression algorithm has the ability to detect the end of the data when decompressing. The CompressedStream is itself a pass-through stream. When reading from it, it first reads from the inner stream, then decompresses the data. When it reaches the end of the compressed data, the CompressedStream has the ability to return a stream on the remaining data, in case this inner stream contains more data after the compressed block. Why isn't this equivalent to the inner stream you might ask? Let's say the inner stream isn't seekable. The CompressedStream's Read method first reads N bytes from the inner stream, but may have found that the end of the compressed data is after M bytes (M < N). The inner stream is already N-M bytes too far. The CombinedStream receives both a byte array (the unused N-M bytes) and the inner stream as ctor parameters, and will expose those as one contiguous stream. Pretty slick!

HeaderFooterStream

Xceed Streaming Compression for .NET exposes stream-based (as opposed to archive-based) compression formats. Those formats all have one thing in common: they have a header and a footer. Not all of them can depend on the deflate algorithm to automatically detect the end of the stream. That's why they need to make sure to never return the first M bytes and last N bytes from the inner stream, where M is the expected header size and N the expected footer size.

WindowStream

When exposing part of a zip file as a single AbstractFile, we need to make sure we do not read past the boundaries of that file's data in the zip file. The WindowStream exposes a region of its inner stream as a zero-position, N-length stream.

ZCryptStream

This pass-though stream automatically encrypts or decrypts the data written or read, using the basic Zip encryption (which is as weak as me in front of a cheese cake). I will be working on AES encryption very soon, and it will most probably be implemented as a pass-through stream too!

NotifyStream

Though pass-through streams can do much of the task, it is often better for the clarity of the code to have processing done by other classes not deriving from System.IO.Stream. The NotifyStream class exposes three events: ReadingFromStream, WritingToStream and ClosingStream. Any other class can advise for those events to intervene in the reading or writing process. This old class exists since the beginning of Xceed Zip for .NET, but it has proven very useful in the current development we are doing for Tar and GZip support within Xceed Zip for .NET.

ForwardSeekableStream

This new class created for Xceed Zip for .NET 3.0 (Tar and GZip support) can expose a non-seekable stream as a seekable stream when reading, or at least a stream reporting a Position when writing. When reading, you can call Seek with an offset behond the current position, and it will simply read from the non-seekable inner stream until well positioned. And for both reading and writing operations, it counts the number of bytes read or written so it can report a position (granting we knew the original position when created).

FtpAsciiDataStream

Xceed FTP for .NET also uses pass-through streams. For example, the FtpAsciiDataStream wraps the NetworkStream to perform convertion of LF to CR/LF on the fly when sending a file in ASCII mode.


.NET | Compression | FileSystem | FTP | Zip

1/31/2006 9:47:29 AM (Eastern Standard Time, UTC-05:00)  #   
 Thursday, November 24, 2005

I must admit, I'm no database wizz. It's been a looooong time since I've played with SQL, and I never really digged into System.Data. The rare moments I required a data connection, the design-time experience was enough, oh, and that Fill call on that adapter! q;-)

Recently, a customer explained he was using an SqlDataReader to fetch its data on demand, to avoid loading too much. One of the fields was a byte array containing a GZip file. He was using the static method GZipCompressedStream.Decompress to get the data in that GZip file. Unfortunately, the data was sometimes quite large, and this technique prevented him from using SqlDataReader.GetBytes (exposed in the IDataRecord interface), which allows to read only chunks of a field at a time.

Bummer... My "Streams everywhere" modo was challenged. You see, Xceed Streaming Compression for .NET allows you to either decompress a single byte array in one operation (stateless), or wrap a GZipCompressedStream around your source stream and read from it to decompress data on the fly. But in this case, no streams. Nada. Or if there is one, I didn't find it.

I was not going to get defeated by that mere absence. The "Streaming" in "Xceed Streaming Compression for .NET" is exactly about that scenario. It turns out it was quite easy to overcome this little problem. I created myself a DataRecordStream class, which derives from System.IO.Stream.

Apart from the usual overrides required when deriving from System.IO.Stream, I expose a constructor requiring an IDataRecord parameter and the index of the field to expose as a stream.

    public DataRecordStream( IDataRecord record, int fieldIndex )
{
if( record == null )
throw new ArgumentNullException( "record" );

if( ( fieldIndex < 0 ) || ( fieldIndex >= record.FieldCount ) )
throw new ArgumentOutOfRangeException( "fieldIndex", fieldIndex, "Invalid field index." );

m_record = record;
m_field = fieldIndex;
}

Then, when Read is called, I simply turn to my IDataRecord's GetBytes method to fill that buffer.

    public override int Read( byte[] buffer, int offset, int count )
{
long read = m_record.GetBytes( m_field, m_position, buffer, offset, count );

m_position += read;

if( ( m_length != -1 ) && ( m_position > m_length ) )
{
// The reported length was smaller than the actual size.
// We update the length dynamically.
m_length = m_position;
}

return unchecked( ( int )read );
}

The rest is just glue for managing the position and allowing seeking. The good part about this new class is that you can now wrap any pass-thru stream around it, for example a GZipCompressedStream. My customer can now read text compressed in a GZip file stored in one of its database fields quite easily, without consuming too much memory.

      SqlConnection connection = new SqlConnection( 
"integrated security=SSPI;data source=xxx;initial catalog=GZipTest" );

connection.Open();

try
{
using( SqlCommand command = new SqlCommand( "SELECT * FROM GZipTestTable", connection ) )
{
using( SqlDataReader dataReader = command.ExecuteReader() )
{
while( dataReader.Read() )
{
using( StreamReader textReader = new StreamReader(
new GZipCompressedStream(
new DataRecordStream(
dataReader, dataReader.GetOrdinal( "GZipField" ) ) ) ) )
{
string line;

while( ( line = textReader.ReadLine() ) != null )
{
Console.WriteLine( line );
}
}
}
}
}
}
finally
{
connection.Close();
}

I have made a VB.NET version of the class too. Enjoy!

DataRecordStream.zip (3.34 KB)



11/24/2005 9:25:04 AM (Eastern Standard Time, UTC-05:00)  #