Built in .NET CSV Parser

In administrative systems, there is often a need to import and parse csv files. .NET actually has a built in CSV parser, although it is well hidden in a VB.NET namespace. If I had known about it I wouldn’t have had to write all those custom (sometimes buggy) parsers.

To really test the parser, I’m going to parse a csv file in the Swedish format.

Name; FactoryLocation; EstablishedYear; ProfitMillionSEK
Volvo; "Gothenburg, Sweden; Gent, Belgium"; 1926; 0,345463
#A comment line
Saab; Trollhättan, Sweden; 1945; -3 009

Note that there is an embedded ; in the FactoryLocation field of Volvo, which is part of the field text and not a field delimiter.

There are three special formatting rules that applies to Swedish csv files.

  • The decimal delimiter is ,
  • The field delimiter is ; to not be confused with the decimal delimiter
  • The thousand separator in numbers is a space.

I really did my best to come up with a format that requires flexibility of the parser, but the TextFieldParser has really flexible configuration options and just worked.

To use the TextFieldParser a reference to the Microsoft.VisualBasic assembly has to be added to the project. Then it’s just to instantiate the parser, set needed configuration through properties and start parsing.

// TextFieldParser is in the Microsoft.VisualBasic.FileIO namespace.
using (TextFieldParser parser = new TextFieldParser(path))
{
    parser.CommentTokens = new string[] { "#" };
    parser.SetDelimiters(new string[] { ";" });
    parser.HasFieldsEnclosedInQuotes = true;
 
    // Skip over header line.
    parser.ReadLine();
 
    while (!parser.EndOfData)
    {
        string[] fields = parser.ReadFields();
        yield return new Brand()
        {
            Name = fields[0],
            FactoryLocation = fields[1],
            EstablishedYear = int.Parse(fields[2]),
            Profit = double.Parse(fields[3], swedishCulture)
        };
    }
}

The parser can be configured with comment tokens and delimiters. It can handle fields enclosed in quotes. There is also multiple read functions. It can read lines just as a string, it can split the line into fields and it can read the remainder of a file as a huge string. To be honest, it’s way better than any of the parsers I’ve written.

The only thing I could possibly wish for is built in conversion to other data types than strings and object materialization. It could be an interesting thing to write, so maybe I’ll come back with a materialization wrapper.

Wouldn’t it be cool to have a data annotations based csv parser? Create a class with proper annotations and then automatically parse data from a csv file!

14 comments

  1. This is very simple, the most important part is missing – ability to process multiline strings (in quotes).

    1. ‘Most’ important part is missing? The most important stuff is there and well explained. You should be able to figure out the rest.

      1. While the quoted multiline strings may not be ‘the most’ important part, it is still required for a valid csv reader. “You should be able to figure out the rest” – the only way to do this is to NOT use TextFieldParser because it can’t do it.

      2. I don’t think that handling multi lines is the “most important” part of a CSV reader. In most cases strings are not multiline. But in the case that you do need the multiline capability, the TextFieldParser is obviously not the right tool for the task.

  2. Life Saver! And Life Changing! ;-) I am poised to parse a couple different kinds of CSV this sprint and was just looking for tips on how to do it intelligently. Thanks for showing me the light. I will never manually parse another CSV file again!

  3. Excellent post! Thanks! Works great (converted to VB.. forgive the trespass.. still working on learning C#).

  4. Thanks, it works. Values within quotes containing commas & new-lines also worked. What’s all the talk below about this not working?

    1. Well, actually I never tested new lines within the quotes myself. I just assumed it didn’t work based on the comments. If that works too, it’s even better. Writing a good csv parser that covers all cases is obviously non-trivial.

  5. It is obviously a very good way to parse flat files, unless you need better performance. It takes almost 9 times as much time as using StreamReader. Is there any way to speed it up?

  6. This looks very useful and I need to parse a CSV. I have the code working in an old VB5 routine and I want to rewrite it for .NET

    My issues is that the first two lines of my file are column headers and a blank line.
    Can I step through these before the parsing starts?

    TIA

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.