Correct CSV files with ghost quotes

Last week I helped out a colleague of the Business Intelligence team. Some CSV input files contained extra quotes (“) in text fields. That threw off the import because the quote is also used to surround strings. Below the regular expression that made his day.

// allowed are 
//  - quote at begin of file, (?!^)
//  - quote at begin of line, (?<!\r\n)
//  - ; following quote (?<!;)
//  - quote following ; (?!;)
//  - quote at end of file (?!$)
//  - quote at end of line (?!\r\n)
//  - rest of quotes must be replaced
string pattern = "(?!^)(?<!\r\n)(?<!;)\"(?!;)(?!\r\n)(?!$)";
string input = File.ReadAllText(sourceFile);
string output = Regex.Replace(input, pattern, "_");
File.WriteAllText(destinationFile, output);
100;"Hello world";"$450"
101;"Wrong "code"";"$300"
100;"Hello world";"$450"
101;"Wrong _code_";"$300"

For the regular expression I used Kick-ass online tool for regex testing.

30 jan 2014 added end-of-line character for processing file at once. ^ and $ are only beginning and end of file, not per line. fixed with \r\n check.

About erictummers

Working in a DevOps team is the best thing that happened to me. I like challenges and sharing the solutions with others. On my blog I’ll mostly post about my work, but expect an occasional home project, productivity tip and tooling review.
This entry was posted in Development and tagged , . Bookmark the permalink.

1 Response to Correct CSV files with ghost quotes

  1. Pingback: Strange characters after reading and writing a textfile | .NET Development by Eric

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.