AzCopy for Linux and Mac

AzCopy is a command line tool on Windows for copying files to and from local file systems to Azure Storage and also between two Azure Storage Accounts. It is a performant little EXE that you can download from azure.com. But, with the growth of Linux and Open Source on Azure, tools and utilities like AzCopy must be cross platform and until recently, AzCopy was exclusive for the Windows platform.

Say Hello to AzCopy for Linux/Mac

In April, Alan Stephenson in the Azure product group for Azure Batch has written a Python script that basically implements the same functionality as AzCopy. Since it is a Python script, this means that you can run in on Linux or Mac and copy files to and from Azure.

The magic sauce of this the low level copying is done is using the Storage REST APIs PutBlock and PutBlockList. It is a technique that has been available since Storage was general available in 2009, ie it is a long proven technology. Most, if not all, tools that offer copying of files to/from Azure Storage probably implement this API. It uploads a large file in chunks calling PutBlock for each chunk and at the end calls PutBlockList once with a list of all blocks and the way they are ordered. When Azure Storage receives this call it actually commits all blocks into a file. The app that uploads the file can be multithreaded to parallellize the invokation of PutBlock and thus making sure that the NIC is utilized to the max to ensure a fast upload. The Python script is basically an implementation of these APIs.

Getting the script to work

The installation instructions on GitHub starts with the words “if you encounter difficulties installing…” which is a nice way of saying that getting this Python script to run in your environment can be a no brainer or filled with problems. My experience ranges from worked-directly to needed-to-upgrade-pip to needed-to-upgrade-alot-of-things. If you like to try this script out, this is how I got it to work n a Ubuntu 14.04 LTS VM on Azure

blobxfer-01

The libffi-dev, libssl-dev, python-dev and python-pip are all packages you need to successfully run the pip install blobxfer command. The upgrading of ndg-httpsclient is something you need to do to avoid having alot of SSL warnings spewed to stdout at runtime when you run blobxfer.

Uploading files from Linux to Azure Storage using SAS-Keys

It is easy to start using the Storage Account Key when transfering files, but this is something you should avoid as a best practice since it means that whoever has it can do what they want with the Storage Account (like delete it). A better approach is to generate a SAS-key (Shared Access Signatures).

A SAS-Key is something you can generate on a container or blob in a Storage Account. You can give rights, like Read, Write, Delete, List for a specific time and you can even restrict it to ip address ranges (see link in reference section for details). The caller passes the SAS-Key as a query string parameter in the url for the REST API call to PutBlock/PutBlockList. You can generate a SAS-key easily  with the Azure Storage Explorer that is available for public download (see refs).

blobxfer-02

To test run the Python script, I set the SAS-Key as a variable (without the leading question mark) and pass it on the command line as –saskey. Uploading a 1GB file from a Ubuntu VM in the same Azure datacenter takes 44 seconds with a throughput of ~209 MBit/second or ~26MB/second. That it doesn’t go any faster is due to the NIC on the VM which we basically exhaust during the uploading.

blobxfer-03

Summary

On a weekly basis, I meet customers that haven’t really understood the potential of using the HTTP endpoint that Azure Storage exposes and think that you must have some expensive appliance infront of it. An appliance may have its benefits and reasons, but if all you want to do is to have a fast way of up- or downloading files from Azure Storage, then AzCopy is probably what you are looking for. With this Python script you now have the same cross plattform functionality and have a AzCopy like clone to use.

References

Alan Stephensons blog post
https://blogs.technet.microsoft.com/windowshpc/2015/04/15/linux-blob-file-transfer-python-code-sample/

GitHub repo with Python code
https://github.com/Azure/azure-batch-samples/tree/master/Python/Storage

PutBlock and PutBlockList API MSDN documentation
http://msdn.microsoft.com/en-us/library/azure/dd135726.aspx

SAS-Keys explained
https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-shared-access-signature-part-1/

Azure Storage Explorer
https://azure.microsoft.com/en-us/downloads/