I'm dealing with rather large data sets these days. I'm talking on the order of a few million records easy. One of the tasks I've been given is to take these records and generate a flat file of them to transfer to another process for external processing.
Not a problem, right? In fact it could be done just from the Query Analyzer with its ability to save results to a file.
Well, sometimes that works all right, but this is a process that needs to be run routinely, plus at random other times of the year. So I shoved it into a simple console application where I have a configuration file, and I can override those configuration options via command line arguments.
It worked beautifully for my first few test sets of data I ran through it, the largest being about 5,000 records. Then I pulled the full test data set, which has about a million records. After staring at that black screen for what felt like forever, I went about hunting a way to speed things up.
My studies uncovered the fact that IO.Stream classes were faster than Microsoft.VisualBasic.FilePut classes. The problem there was that I was already using the IO.Stream classes.
So, I stumbled through things, hunting for ways to improve my efficiency. I tried IO.File.CreateText as a way to return a StreamWriter, but that had no discernable efficiency bonuses (added an extra 300-400 milliseconds at 1,000 records).
I tried using a StringBuilder instance to build the file and then shoved it into the StreamWriter all at one time (as opposed to writing directly to the file every line the way I had been doing). Oddly, that still had no discernable efficiency bonuses (added an extra 100-200 milliseconds at 1,000 records).
Almost ready to give up, I went looking through the IO.File class to see what functions I could use. Which is when I stumbled upon IO.File.AppendText. Giving an internal shrug, I set about reworking my function to use a StringBuilder class and the AppendText file.
As an aside, note that if the file exists, I always delete it so that I'm creating a new file every time.
Imagine my surprise when I saw a rather marked difference in efficiency, as evidenced by a two-second drop in speed. Not quite sure what to make of it, I ran the tests again, and then increased my records to 5,000 and then to 100,000.
I was flabbergasted at these results:
Method 1:5,000 Records
Method 1:100,000 Records
Method 2:I couldn't believe it. The AppendText function working with a StringBuilder class was nearly 4 minutes faster than writing the file directly. Sure, it means I have to touch the data an extra time, but with that as a speed increase, that is something I'm quite willing to live with.