Uploading Big Files in Windows Azure Blob Storage using PutListBlock

Windows Azure Blob Storage could be analogized as file-system on the cloud. It enables us to store any unstructured data file such as text, images, video, etc. In this post, I will show how to upload big file into Windows Azure Storage. Please be inform that we will be using Block Blob for this case. For more information about Block Blob and Page Block, please visit here.

I am assume that you know how to upload a file to Windows Azure Storage. If you don’t know, I would recommend you to check out this lab from Windows Azure Training Kit.

Uploading a blob (commonly-used technique)

The following snippet show you how to upload a blob using a commonly-used technique, blob.UploadFromStream() which eventually invoking PutBlob REST-API.

protected void btnUpload_Click(object sender, EventArgs e)
{
    var storageAccount = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
    blobClient = storageAccount.CreateCloudBlobClient();

    CloudBlobContainer container = blobClient.GetContainerReference("image2");
    container.CreateIfNotExist();

    var permission = container.GetPermissions();
    permission.PublicAccess = BlobContainerPublicAccessType.Container;
    container.SetPermissions(permission);

    string name = fu.FileName;
    CloudBlob blob = container.GetBlobReference(name);
    blob.UploadFromStream(fu.FileContent);
}

The above code snippet works well in most case. Although you could upload at maximum 64 MB per file (for block blob), it’s more recommended to upload using another technique which I am going to describe more detail.

Uploading a blob by splitting it into chunks and calling PutBlockList

The idea of this technique is to split a block blob into smaller chunk of blocks, uploading them one-by-one or in-parallel and eventually join them all by calling PutBlockList().

protected void btnUpload_Click(object sender, EventArgs e)
{
    CloudBlobClient blobClient;
    var storageAccount = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
    blobClient = storageAccount.CreateCloudBlobClient();

    CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");
    container.CreateIfNotExist();

    var permission = container.GetPermissions();
    permission.PublicAccess = BlobContainerPublicAccessType.Container;
    container.SetPermissions(permission);

    string name = fu.FileName;
    CloudBlockBlob blob = container.GetBlockBlobReference(name);

    blob.UploadFromStream(fu.FileContent);

    int maxSize = 1 * 1024 * 1024; // 4 MB

    if (fu.PostedFile.ContentLength > maxSize)
    {
        byte[] data = fu.FileBytes; 
        int id = 0;
        int byteslength = data.Length;
        int bytesread = 0;
        int index = 0;
        List<string> blocklist = new List<string>();
        int numBytesPerChunk = 250 * 1024; //250KB per block
    
        do
        {
            byte[] buffer = new byte[numBytesPerChunk];
            int limit = index + numBytesPerChunk;
            for (int loops = 0; index < limit; index++)
            {
                buffer[loops] = data[index];
                loops++;
            }
            bytesread = index;
            string blockIdBase64 = Convert.ToBase64String(System.BitConverter.GetBytes(id));

            blob.PutBlock(blockIdBase64, new MemoryStream(buffer, true), null); 
            blocklist.Add(blockIdBase64);
            id++;
        } while (byteslength - bytesread > numBytesPerChunk);

        int final = byteslength - bytesread;
        byte[] finalbuffer = new byte[final];
        for (int loops = 0; index < byteslength; index++)
        {
            finalbuffer[loops] = data[index];
            loops++;
        }
        string blockId = Convert.ToBase64String(System.BitConverter.GetBytes(id));
        blob.PutBlock(blockId, new MemoryStream(finalbuffer, true), null);
        blocklist.Add(blockId);

        blob.PutBlockList(blocklist); 
    }
    else
        blob.UploadFromStream(fu.FileContent);            
}

Explanation about the code snippet

Since the idea is to split the big file into chunks. We would need to define size of each chunk, in this case 250KB. By dividing actual size with size of each chunk, we should be able to know number of chunk we need to split.

image

We also need to have a list of string (in this case: blocklist variable) to determine the blocks are in one group. Then we will loop to through each chunk and perform and upload by calling blob.PutBlock() and add it (as form of Base64 String) into the blocklist.

Note that there’s actually a left-over block that didn’t uploaded inside the loop. We will need to upload it again. When all blocks are successfully uploaded, finally we call blob.PutBlockList(). Calling PutListBlock() will commit all the blocks that we’ve uploaded previously.

Pros and Cons

The benefits (pros) of the technique

There’re a few benefit of using this technique:

  • In the event where uploading one of the block fail due to whatever condition like connection time-out, connection lost, etc. We’ll just need to upload that particular block only, not the entire big file / blob.
  • It’s also possible to upload each block in-parallel which might result shorter upload time.
  • The first technique will only allow you to upload a block blob at maximum 64MB. With this technique, you can do more almost unlimited.

The drawbacks (cons) of the technique

Despite of the benefits, there’re also a few drawbacks:

  • You have more code to write. As you can see from the sample, you can simply call the one line blob.UploadFromStream() in the first technique. But you will need to write 20+ lines of code for the second technique.
  • It incurs more storage transaction as may lead to higher cost in some case. Referring to a post by Azure Storage team. The more chuck you have, the more storage transaction is incurred.

Large blob upload that results in 100 requests via PutBlock, and then 1 PutBlockList for commit = 101 transactions

Summary

I’ve shown you how to upload file with simple technique at beginning. Although, it’s easy to use, it has a few limitation. The second technique (using PutListBlock) is more powerful as it could do more than the first one. However, it certainly has some pros and cons as well.

I hope you could be able to use either one of them appropriately in your scenario. Hope this helps!

This entry was posted in Uncategorized. Bookmark the permalink.

18 Responses to Uploading Big Files in Windows Azure Blob Storage using PutListBlock

  1. dzone says:

    Awesome post. We”re looking for some developer-oriented Cloud content at DZone, and you look like you”re knee-deep in Azure. Let me know if you”d like to have your content reposted including a link back to your blog.

    Best,

    Eric Genesky
    Community Curator
    DZone, Inc.

  2. Ryan says:

    Hello Eric,

    Creating a byte array within your loop:
    byte[] buffer = new byte[numBytesPerChunk];

    cause a lot of pressure on the .NET managed heap. This would create a 250k byte array, which would be allocated in the large object heap, potentially causing fragmentation on the heap running the risk of an out of memory exception. You should create a singular array outside the loop and re-use it. Also, you are not able to upload more than 64MB to a block blob regardless of the transfer method used; and, if the stream is > 32mb (or whatever threshold SingleBlobUploadThresholdInBytes is set to) the client will automatically split the stream into 4mb chunks and transfer them on separate IO threads in parallel based on the thread-count property (ParallelOperationThreadCount ). Great post — I would just argue there are some built-in means of doing this for block-blobs.

  3. cpa beyond says:

    WOW just what I was loοkіng for. Came here
    bу ѕearching for emigrate

  4. Pingback: How to Upload a blob to windows azure by splitting it into chunks and calling PutBlockList using REST and PHP - Windows Azure Blog

  5. You can also use the Blob Transfer Utility to download and upload all your blob files.

    It’s a tool to handle thousands of (small/large) blob transfers in a effective way.

    Binaries and source code, here: http://bit.ly/blobtransfer

  6. Scarlett says:

    Excellent post. I was checking constantly this blog and
    I’m impressed! Extremely helpful info specifically the remaining part 🙂 I deal with such info a lot. I was looking for this certain info for a very long time. Thank you and best of luck.

  7. erwin says:

    Thanks for your post.
    But I have been looking too at the usage of a MultipartFileStreamProvider.
    This way I found it possible to upload big files (more than 64 MB) and this at a high speed. See for example the post of Yao : http://blogs.msdn.com/b/yaohuang1/archive/2012/07/02/asp-net-web-api-and-azure-blob-storage.aspx

  8. Mario says:

    Hello,

    We recently start using Azure storage and we would like to find some answers regarding Azure storage blockblob – upload and download huge files. Following are our questions:

    – our current implementation for upload and download is using UploadFromStream and DownloadToStream ( Storage Client Library 2.0 ) – is this ok for huge files? We have a delay up to 1 min on download for example until the message to save the file is displayed to the user – this for a 100MB file – can we reduce this, based on checks we did is time needed to create the memorystream

    – other option we found on net is to split huge files into chunks and upload/download in parallel. Our web application it is not on Azure, we would like to use just the Azure storage for clients files. Our goal is to to avoid using our bandwidth for these upload/download client files. This second option, splitting into chunks helps? Do we use our bandwidth, or upload/download or it is done directly, between Azure server and user browser?

    Thx in advance,

    Mario

    • admin says:

      Hi,
      – you can seriously consider the 2nd technique using PutBlockList when dealing with huge files
      – the splitting into chunks helps in term of upload speed (given that you’ve enough bandwidth) and resume capability (when one of the file failed). Note that splitting the file into chucks required local access, meaning that you might only be able to do that on smart client app (win form / win service / console app / etc.)…
      on web app, you can consider using Silverlight like what Steve did here http://blog.smarx.com/posts/uploading-windows-azure-blobs-from-silverlight-part-3-handling-big-files-by-using-the-block-apis

      • Mario says:

        Hello,

        I need some help regarding Azure storage uploading huge files (eg. 300-500 MB). What I understood so far is that best method is to chunk these huge files. My client wants to understand how this split in chunks is done. His concern is regarding his web server resources, if these are used or not (eg. memory, bandwidth).

        Here is the scenario we have: my client has a web application hosted on his web server. His customers needs to connect to web app and upload these huge files. What he wants to achieve with Azure storage implementation is to have these customers uploading their files directly on Azure, not using his bandwidth or his web server memory.

        Can someone explain how this split into chunks works, keeping in mind that my client is not a programmer – he’s into IT field but not developer. What we have is following: client web server hosting web app + Azure storage + customer browser. How this split into chunks works? As far as I understood huge file is splitted into smaller parts, memory stream is built ( BTW which memory is used here?)

        Thx in advance for your help.

        BR,
        Mario

  9. Mudassir says:

    What is fu in code? I am little bit confused about it and can please you give sample project for it?

    • admin says:

      It’s file upload control in ASP.NET

      • Martin says:

        I am trying to use this example in an MVC example. How exactly am I supposed to get the code to understand the “fu”? I mean, the FileUpload is placed in my view and your code above in the controller, how can I reference the FileUpload?

  10. Ben says:

    Nice post! Anyone with insights on how this can be done from android?

Leave a Reply

Your email address will not be published. Required fields are marked *

*