I hit a major performance problem when processing a 30000 row file - I had already processed 5000 records and the iterative (line by line) approach was just not working. I put my SQL hat on and looked for a C# equivalent of the EXCEPT command.
Enter Googles Diff Match Patch.
It has a simple C# class that performs incredibly well. I now has a very simple and fast solution.
Here is the code:
// Load the source file into a string
using (StreamReader reader = new StreamReader(MYSOURCEFILE))
source = reader.ReadToEnd();
// Load the comparison file into a string
using (StreamReader reader = new StreamReader(MYTARGETFILE))
target = reader.ReadToEnd();
// Instantiate the classvar comparisonClass = new diff_match_patch();
// Perform the comparison
var diffs = comparisonClass.diff_main(source, target, true);
// Find all new items
var newRecords = diffs.FindAll(x => x.operation.ToString() == "INSERT");
I can now iterate through the new records and process them as required.
foreach (var newRecord in newRecords)
// Insert magic here...