Tuesday, March 08, 2005

Warning: Do not try this at home!

A few days ago, Pierre-Luc at support asked me if Xceed Zip for .NET was thread safe. I knew from his look that he was expecting a "yes" or a "no". At least, that's what the client who asked him the same question expected.

My first answer was more in nuances: Though the library was made to be safely accessible from multiple threads at the same time, by the nature of the sequential format of the zip file, it is not possible to work on the same zip file from multiple threads.

He nodded with approbation, confirming me his client wasn't trying such crazy action, but simply dealing with a multi-threaded application where each thread may be zipping in its own private file. I gave him my benediction: In that case, yes, Xceed Zip for .NET is thread safe.

Pierre-Luc wasn't two feet away when I was illuminated by an idea. It wouldn't be that crazy to try zipping into the same zip file from multiple threads. How neat would it be to benefit from multi processor or hyperthreading machines for zipping a single file? Guess what... you can! You shouldn't... but you can! Don't ask us to support this scenario... but you can!

Here's the deal. Any ZipArchive you modify gets updated when the last modify operation occurs. If you know you're about to make more than one modification to a single zip file, you should first call BeginUpdate, do all modifications, and finally call EndUpdate. The zip file will only get rebuilt on that final call. The files you copy into the zip file before EndUpdate will be compressed and stored in temp files within the ZipArchive's TempFolder.

That means any copy operation you perform within a BeginUpdate/EndUpdate block are atomic, and only involve compressing the sources into independant temp files. You see where I'm heading? How about spawning threads within that block, each thread copying its own source, and waiting for all threads to finish before calling EndUpdate?

I had to try it. I started with a class implementing IAsyncResult, which would be managing the copy operation on a separate thread:

  internal class AsyncCopyResult : IAsyncResult

  {

    public AsyncCopyResult(

      AbstractFolder source,

      AbstractFolder dest,

      AsyncCallback callback,

      object state )

    {

      m_source = source;

      m_dest = dest;

      m_callback = callback;

      m_state = state;

 

      m_thread = new Thread( new ThreadStart( this.ThreadProc ) );

      m_completed = new ManualResetEvent( false );

    }

 

    public void Begin()

    {

      m_completed.Reset();

      m_thread.Start();

    }

 

    public void End()

    {

      // We must not join thread since we may get called by callback, itself

      // within thread.

      m_completed.WaitOne();

 

      if( m_result != null )

        throw m_result;

    }

 

    #region IAsyncResult IMPLEMENTATION

 

    public object AsyncState

    {

      get { return m_state; }

    }

 

    public bool CompletedSynchronously

    {

      get { return false; }

    }

 

    public WaitHandle AsyncWaitHandle

    {

      get { return m_completed; }

    }

 

    public bool IsCompleted

    {

      get { return m_completed.WaitOne( 1, false ); }

    }

 

    #endregion

 

    private void ThreadProc()

    {

      try

      {

        m_result = null;

 

        if( m_source == null )

          throw new ArgumentNullException( "source" );

 

        if( m_dest == null )

          throw new ArgumentNullException( "dest" );

 

        if( m_source.IsRoot )

        {

          m_source.CopyFilesTo( m_dest, true, true );

        }

        else

        {

          m_source.CopyTo( m_dest, true );

        }

      }

      catch( Exception except )

      {

        m_result = except;

      }

 

      m_completed.Set();

 

      if( m_callback != null )

      {

        try

        {

          // Important note: This callback may be calling End.

          // Thus End's implementation should not wait for thread,

          // but for handle.

          m_callback( this );

        }

        catch

        {

          System.Diagnostics.Debug.WriteLine( "Unhandled exception within callback." );

        }

      }

    }

 

    private Thread m_thread = null;

    private ManualResetEvent m_completed = null;

 

    private AbstractFolder m_source = null;

    private AbstractFolder m_dest = null;

    private AsyncCallback m_callback = null;

    private object m_state = null;

 

    private Exception m_result = null;

  }

The ThreadProc method is simply copying the source folder into the destination folder. The rest is plumbing for implementing IAsyncResult. In my main class, I created a static method that uses the AsyncCopyResult class like this:

    private static void CopyMultipleFolders(

      AbstractFolder[] sources,

      AbstractFolder dest )

    {

      // I'm using AutoBatchUpdate with the using directive, an easy way

      // to call BeginUpdate and EndUpdate only if the folder implements

      // IBatchUpdateable.

      using( AutoBatchUpdate auto = new AutoBatchUpdate( dest ) )

      {

        AsyncCopyResult[] results = new AsyncCopyResult[ sources.Length ];

 

        // First create the threads and state objects.

        for( int i=0; i<sources.Length; i++ )

        {

          results[ i ] = new AsyncCopyResult( sources[ i ], dest, null, null );

        }

 

        // Then launch each thread

        foreach( AsyncCopyResult result in results )

        {

          result.Begin();

        }

 

        // We can't call WaitAll on an STA thread, but it doesn't matter.

        // We wait for each one separately.

        foreach( AsyncCopyResult result in results )

        {

          result.AsyncWaitHandle.WaitOne();

 

          try

          {

            result.End();

          }

          catch( Exception except )

          {

            Console.WriteLine( except.Message );

          }

        }

      }

    }

Once each thread is done copying its own source folder into the destination folder, the AutoBatchUpdate class calls EndUpdate on the destination folder (if it implements IBatchUpdateable). In the case of a zip file destination, the final zip file is built by reassembling already compressed temp files. Here's an example of how to call CopyMultipleFolders:

        AbstractFile target = new DiskFile( @"d:\temp\multi.zip" );

 

        if( target.Exists )

          target.Delete();

 

        ZipArchive zip = new ZipArchive( target );

        AbstractFolder firstSource = new DiskFolder( @"d:\Downloads" );

        AbstractFolder secondSource = new DiskFolder( @"d:\Music" );

 

        CopyMultipleFolders( new AbstractFolder[] { firstSource, secondSource }, zip );

The best thing is that this method works for any kind of AbstractFolder, source or destination. If you're confident the size of the zip file isn't too large, you can improve performance by setting the ZipArchive's TempFolder to a new MemoryFolder.

But remember: Don't try this! I didn't tell you it was possible.



3/8/2005 10:56:48 AM (Eastern Standard Time, UTC-05:00)  #