26 Feb 2012 @ 1:40 PM 

Windows Azure Blob Storage could be analogized as file-system on the cloud. It enables us to store any unstructured data file such as text, images, video, etc. In this post, I will show how to upload big file into Windows Azure Storage. Please be inform that we will be using Block Blob for this case. For more information about Block Blob and Page Block, please visit here.

I am assume that you know how to upload a file to Windows Azure Storage. If you don’t know, I would recommend you to check out this lab from Windows Azure Training Kit.

Uploading a blob (commonly-used technique)

The following snippet show you how to upload a blob using a commonly-used technique, blob.UploadFromStream() which eventually invoking PutBlob REST-API.

protected void btnUpload_Click(object sender, EventArgs e)
{
    var storageAccount = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
    blobClient = storageAccount.CreateCloudBlobClient();

    CloudBlobContainer container = blobClient.GetContainerReference("image2");
    container.CreateIfNotExist();

    var permission = container.GetPermissions();
    permission.PublicAccess = BlobContainerPublicAccessType.Container;
    container.SetPermissions(permission);

    string name = fu.FileName;
    CloudBlob blob = container.GetBlobReference(name);
    blob.UploadFromStream(fu.FileContent);
}

The above code snippet works well in most case. Although you could upload at maximum 64 MB per file (for block blob), it’s more recommended to upload using another technique which I am going to describe more detail.

Uploading a blob by splitting it into chunks and calling PutBlockList

The idea of this technique is to split a block blob into smaller chunk of blocks, uploading them one-by-one or in-parallel and eventually join them all by calling PutBlockList().

protected void btnUpload_Click(object sender, EventArgs e)
{
    CloudBlobClient blobClient;
    var storageAccount = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
    blobClient = storageAccount.CreateCloudBlobClient();

    CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");
    container.CreateIfNotExist();

    var permission = container.GetPermissions();
    permission.PublicAccess = BlobContainerPublicAccessType.Container;
    container.SetPermissions(permission);

    string name = fu.FileName;
    CloudBlockBlob blob = container.GetBlockBlobReference(name);

    blob.UploadFromStream(fu.FileContent);

    int maxSize = 1 * 1024 * 1024; // 4 MB

    if (fu.PostedFile.ContentLength > maxSize)
    {
        byte[] data = fu.FileBytes; 
        int id = 0;
        int byteslength = data.Length;
        int bytesread = 0;
        int index = 0;
        List<string> blocklist = new List<string>();
        int numBytesPerChunk = 250 * 1024; //250KB per block
    
        do
        {
            byte[] buffer = new byte[numBytesPerChunk];
            int limit = index + numBytesPerChunk;
            for (int loops = 0; index < limit; index++)
            {
                buffer[loops] = data[index];
                loops++;
            }
            bytesread = index;
            string blockIdBase64 = Convert.ToBase64String(System.BitConverter.GetBytes(id));

            blob.PutBlock(blockIdBase64, new MemoryStream(buffer, true), null); 
            blocklist.Add(blockIdBase64);
            id++;
        } while (byteslength - bytesread > numBytesPerChunk);

        int final = byteslength - bytesread;
        byte[] finalbuffer = new byte[final];
        for (int loops = 0; index < byteslength; index++)
        {
            finalbuffer[loops] = data[index];
            loops++;
        }
        string blockId = Convert.ToBase64String(System.BitConverter.GetBytes(id));
        blob.PutBlock(blockId, new MemoryStream(finalbuffer, true), null);
        blocklist.Add(blockId);

        blob.PutBlockList(blocklist); 
    }
    else
        blob.UploadFromStream(fu.FileContent);            
}

Explanation about the code snippet

Since the idea is to split the big file into chunks. We would need to define size of each chunk, in this case 250KB. By dividing actual size with size of each chunk, we should be able to know number of chunk we need to split.

image

We also need to have a list of string (in this case: blocklist variable) to determine the blocks are in one group. Then we will loop to through each chunk and perform and upload by calling blob.PutBlock() and add it (as form of Base64 String) into the blocklist.

Note that there’s actually a left-over block that didn’t uploaded inside the loop. We will need to upload it again. When all blocks are successfully uploaded, finally we call blob.PutBlockList(). Calling PutListBlock() will commit all the blocks that we’ve uploaded previously.

Pros and Cons

The benefits (pros) of the technique

There’re a few benefit of using this technique:

  • In the event where uploading one of the block fail due to whatever condition like connection time-out, connection lost, etc. We’ll just need to upload that particular block only, not the entire big file / blob.
  • It’s also possible to upload each block in-parallel which might result shorter upload time.
  • The first technique will only allow you to upload a block blob at maximum 64MB. With this technique, you can do more almost unlimited.

The drawbacks (cons) of the technique

Despite of the benefits, there’re also a few drawbacks:

  • You have more code to write. As you can see from the sample, you can simply call the one line blob.UploadFromStream() in the first technique. But you will need to write 20+ lines of code for the second technique.
  • It incurs more storage transaction as may lead to higher cost in some case. Referring to a post by Azure Storage team. The more chuck you have, the more storage transaction is incurred.

Large blob upload that results in 100 requests via PutBlock, and then 1 PutBlockList for commit = 101 transactions

Summary

I’ve shown you how to upload file with simple technique at beginning. Although, it’s easy to use, it has a few limitation. The second technique (using PutListBlock) is more powerful as it could do more than the first one. However, it certainly has some pros and cons as well.

I hope you could be able to use either one of them appropriately in your scenario. Hope this helps!

Posted By: admin
Last Edit: 26 Feb 2012 @ 01:40 PM

EmailPermalinkComments (13)
Tags

 Last 50 Posts
 Back
Change Theme...
  • Users » 123
  • Posts/Pages » 73
  • Comments » 86
Change Theme...
  • VoidVoid
  • LifeLife
  • EarthEarth
  • WindWind « Default
  • WaterWater
  • FireFire
  • LightLight

About Me



    No Child Pages.