Wednesday, 1 October 2014

C#: Compare two text files (quickly)

I am busy writing a windows service that is send emails. However, I need to track the emails that have already been sent to that people. The current solution writes the audit to a text file (SQL Server is not an option).

I hit a major performance problem when processing a 30000 row file - I had already processed 5000 records and the iterative (line by line) approach was just not working. I put my SQL hat on and looked for a C# equivalent of the EXCEPT command.

Enter Googles Diff Match Patch.

It has a simple C# class that performs incredibly well. I now has a very simple and fast solution.

Here is the code:

using DiffMatchPatch;

string source;
string target;

// Load the source file into a string
using (StreamReader  reader = new StreamReader(MYSOURCEFILE))
  source = reader.ReadToEnd();

// Load the comparison file into a string
using (StreamReader  reader = new StreamReader(MYTARGETFILE))
  target = reader.ReadToEnd();

// Instantiate the class
var comparisonClass = new diff_match_patch();

// Perform the comparison
var diffs = comparisonClass.diff_main(source, target, true);

// Find all new items
var newRecords = diffs.FindAll(x => x.operation.ToString() == "INSERT");

I can now iterate through the new records and process them as required.

foreach (var newRecord in newRecords)
 // Insert magic here...

No comments:

Post a comment