Wednesday, 19 July 2017

Azure Blob Storage: How can I quickly read a text file?

My current project has the requirement to read a large text file from blob storage and then process the contents. Getting the file is easy:

var storageAccount = CloudStorageAccount.Parse(CONNECTIONSTRING);
var container = storageAccount.CreateCloudBlobClient().GetContainerReference(containerName);
var blob =  container.GetBlobReference(fileName);

And the data can be easily read:

Stream stream = new MemoryStream();
blob.DownloadToStream(stream);
stream.Position = 0;

string text = "";
using (StreamReader reader = new StreamReader(stream))
{
text = reader.ReadToEnd();
}
var lines = text.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);

The real time saver came when I had to process the contents. A traditional IEnumerable was too slow, but luckily Parallel saved the day:

Parallel.For(0, lines.Length - 1, i =>
{
DoSomethingWithTheRow(lines[i]);
});

Its lightening fast.

A word of warning: I passed a generic into the function (List<MyCustomType>) and found out very quickly that it is not thread safe. I changed to ConcurrentQueue<MyCustomType> and the issue was resolved.

No comments:

Post a comment