Last week I helped out a colleague of the Business Intelligence team. Some CSV input files contained extra quotes (“) in text fields. That threw off the import because the quote is also used to surround strings. Below the regular expression that made his day.
// allowed are // - quote at begin of file, (?!^) // - quote at begin of line, (?<!\r\n) // - ; following quote (?<!;) // - quote following ; (?!;) // - quote at end of file (?!$) // - quote at end of line (?!\r\n) // - rest of quotes must be replaced string pattern = "(?!^)(?<!\r\n)(?<!;)\"(?!;)(?!\r\n)(?!$)"; string input = File.ReadAllText(sourceFile); string output = Regex.Replace(input, pattern, "_"); File.WriteAllText(destinationFile, output);
ID;Description;Price 100;"Hello world";"$450" 101;"Wrong "code"";"$300"
ID;Description;Price 100;"Hello world";"$450" 101;"Wrong _code_";"$300"
For the regular expression I used regexhero.net. Kick-ass online tool for regex testing.
[edit]
30 jan 2014 added end-of-line character for processing file at once. ^ and $ are only beginning and end of file, not per line. fixed with \r\n check.
[/edit]
Pingback: Strange characters after reading and writing a textfile | .NET Development by Eric