Thursday, November 8, 2007

Speeding up File Writing

I'm dealing with rather large data sets these days. I'm talking on the order of a few million records easy. One of the tasks I've been given is to take these records and generate a flat file of them to transfer to another process for external processing.

Not a problem, right? In fact it could be done just from the Query Analyzer with its ability to save results to a file.

Well, sometimes that works all right, but this is a process that needs to be run routinely, plus at random other times of the year. So I shoved it into a simple console application where I have a configuration file, and I can override those configuration options via command line arguments.

It worked beautifully for my first few test sets of data I ran through it, the largest being about 5,000 records. Then I pulled the full test data set, which has about a million records. After staring at that black screen for what felt like forever, I went about hunting a way to speed things up.

My studies uncovered the fact that IO.Stream classes were faster than Microsoft.VisualBasic.FilePut classes. The problem there was that I was already using the IO.Stream classes.

So, I stumbled through things, hunting for ways to improve my efficiency. I tried IO.File.CreateText as a way to return a StreamWriter, but that had no discernable efficiency bonuses (added an extra 300-400 milliseconds at 1,000 records).

I tried using a StringBuilder instance to build the file and then shoved it into the StreamWriter all at one time (as opposed to writing directly to the file every line the way I had been doing). Oddly, that still had no discernable efficiency bonuses (added an extra 100-200 milliseconds at 1,000 records).

Almost ready to give up, I went looking through the IO.File class to see what functions I could use. Which is when I stumbled upon IO.File.AppendText. Giving an internal shrug, I set about reworking my function to use a StringBuilder class and the AppendText file.

As an aside, note that if the file exists, I always delete it so that I'm creating a new file every time.

Imagine my surprise when I saw a rather marked difference in efficiency, as evidenced by a two-second drop in speed. Not quite sure what to make of it, I ran the tests again, and then increased my records to 5,000 and then to 100,000.

I was flabbergasted at these results:
1,000 Records

Method 1:
1. 0:0:16:47
2. 0:0:14:344
3. 0:0:14:406

Method 2:
1. 0:0:12:687
2. 0:0:12:78
3. 0:0:12:343
5,000 Records
Method 1:
1. 0:0:25:0
2. 0:0:24:686
3. 0:0:24:702

Method 2:
1. 0:0:12:280
2. 0:0:12:124
3. 0:0:12:265
100,000 Records
Method 1:
1. 0:4:23:249
2. 0:4:25:798
3. 0:4:25:675
Method 2:
1. 0:0:18:952
2. 0:0:21:77
3. 0:0:20:983
I couldn't believe it. The AppendText function working with a StringBuilder class was nearly 4 minutes faster than writing the file directly. Sure, it means I have to touch the data an extra time, but with that as a speed increase, that is something I'm quite willing to live with.

1 comment:

Mona said...

I'd like to see the code behind this.

Blog Widget by LinkWithin