Correct CSV files with ghost quotes

Last week I helped out a colleague of the Business Intelligence team. Some CSV input files contained extra quotes (“) in text fields. That threw off the import because the quote is also used to surround strings. Below the regular expression that made his day.

// allowed are 
//  - quote at begin of file, (?!^)
//  - quote at begin of line, (?<!\r\n)
//  - ; following quote (?<!;)
//  - quote following ; (?!;)
//  - quote at end of file (?!$)
//  - quote at end of line (?!\r\n)
//  - rest of quotes must be replaced
string pattern = "(?!^)(?<!\r\n)(?<!;)\"(?!;)(?!\r\n)(?!$)";
string input = File.ReadAllText(sourceFile);
string output = Regex.Replace(input, pattern, "_");
File.WriteAllText(destinationFile, output);
ID;Description;Price
100;"Hello world";"$450"
101;"Wrong "code"";"$300"
ID;Description;Price
100;"Hello world";"$450"
101;"Wrong _code_";"$300"

For the regular expression I used regexhero.net. Kick-ass online tool for regex testing.

[edit]
30 jan 2014 added end-of-line character for processing file at once. ^ and $ are only beginning and end of file, not per line. fixed with \r\n check.
[/edit]

About erictummers

My work as a recruited developer changes almost every month. I like challenges and sharing the solutions with others. On my blog I’ll mostly post about my work, but expect an occasional home project, productivity tip and tooling review.
This entry was posted in Development and tagged , . Bookmark the permalink.

One Response to Correct CSV files with ghost quotes

  1. Pingback: Strange characters after reading and writing a textfile | .NET Development by Eric

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s