Azure Blob Storage: How can I quickly read a text file?
My current project has the requirement to read a large text file from blob storage and then process the contents. Getting the file is easy:
var storageAccount = CloudStorageAccount.Parse(CONNECTIONSTRING);
var container = storageAccount.CreateCloudBlobClient().GetContainerReference(containerName);
var blob = container.GetBlobReference(fileName);
And the data can be easily read:
Stream stream = new MemoryStream();
blob.DownloadToStream(stream);
stream.Position = 0;
string text = "";
using (StreamReader reader = new StreamReader(stream))
{
text = reader.ReadToEnd();
}
var lines = text.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
var storageAccount = CloudStorageAccount.Parse(CONNECTIONSTRING);
var container = storageAccount.CreateCloudBlobClient().GetContainerReference(containerName);
var blob = container.GetBlobReference(fileName);
And the data can be easily read:
Stream stream = new MemoryStream();
blob.DownloadToStream(stream);
stream.Position = 0;
string text = "";
using (StreamReader reader = new StreamReader(stream))
{
text = reader.ReadToEnd();
}
var lines = text.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
The real time saver came when I had to process the contents. A traditional IEnumerable was too slow, but luckily Parallel saved the day:
Parallel.For(0, lines.Length - 1, i =>
{
DoSomethingWithTheRow(lines[i]);
});
Its lightening fast.
A word of warning: I passed a generic into the function (List<MyCustomType>) and found out very quickly that it is not thread safe. I changed to ConcurrentQueue<MyCustomType> and the issue was resolved.
Comments
Post a Comment