C # Removing Separation Characters from Quotes

I am writing a program that should remove separator characters from quotation marks in text files.

For instance:

"Hello, my name is world" 

Must be:

 "Hello my name is world" 

At first it sounds pretty simple (I thought it would be so), but you need to determine when the quote begins, when the quote ends, then find this line for the separator characters. How?

I experimented with some regular expressions, but I'm just embarrassed!

Any ideas? Even something to make the ball roll, I'm just completely at a standstill.

+4
source share
8 answers
 string pattern = "\"([^\"]+)\""; value = Regex.Match(textToSearch, pattern).Value; string[] removalCharacters = {",",";"}; //or any other characters foreach (string character in removalCharacters) { value = value.Replace(character, ""); } 
+3
source

why not try and do it with Linq?

 var x = @" this is a great whatever ""Hello, my name is world"" and all that"; var result = string.Join(@"""", x.Split('"'). Select((val, index) => index%2 == 1 ? val.Replace(",", "") : val).ToArray()); 
+2
source

Using the regex pattern ahead of the pattern will look like this: "\"(?=[^\"]+,)[^\"]+\""

\" matches the double quotation mark of the opening. Perspective (?=[^\"]+,) will try to match the comma in the quoted text. Then we match the rest of the string, if it is not a double quote [^\"]+ , then we match the final double quote \" .

Using Regex.Replace allows you to use a compact approach to modify the result and remove unwanted commas.

 string input = "\"Hello, my name, is world\""; string pattern = "\"(?=[^\"]+,)[^\"]+\""; string result = Regex.Replace(input, pattern, m => m.Value.Replace(",", "")); Console.WriteLine(result); 
+2
source

What you want to write is called "lexer" (or, alternatively, a "tokenizer"), which reads the input character by character and breaks it into tokens. In general, how parsing works in the compiler (as a first step). The lexer breaks the text into a stream of tokens (string literal, identifier, "(", etc.). Then the parser takes these tokens and uses them to create a parsing tree.

In your case, you only need a lexer. You will have 2 types of tokens: “quoted strings” and “everything else”.

Then you just need to write code to break the entry into the tokens. By default, something is the "everything else" token. A line character begins when you see, and ends when you see the following. If you are reading the source code, you may have to deal with things like "or" as special cases.

After you have done this, you can simply iterate over the markers and perform all the necessary processing on the "string" tokens.

+1
source

So, do you have a long text with lots of quotes inside? I would make a method that does something like this:

  • Launch conceived a line until you meet the first "
  • Then take the substring up to the next "and make str.Replace (", "," "), and also replace any other characters that you want to replace.
  • Then go on without replacing until you meet the following "and continue to the end.

EDIT

I got a better idea. What about this:

  string mycompletestring = "This is a string\"containing, a quote\"and some more text"; string[] splitstring = mycompletestring.Split('"'); for (int i = 1; i < splitstring.Length; i += 2) { splitstring[i] = splitstring[i].Replace(",", ""); } StringBuilder builder = new StringBuilder(); foreach (string s in splitstring) { builder.Append(s + '"'); } mycompletestring = builder.ToString().Substring(0, builder.ToString().Length - 1); 

I think there should be a better way to combine the string into one with "between them at the end, but I don't know the best, so feel free to suggest a good method here :)

0
source

I needed to do something similar in the application that I use to translate flat files. This is the approach I took: (just copy / paste from my application)

  protected virtual string[] delimitCVSBuffer(string inputBuffer) { List<string> output = new List<string>(); bool insideQuotes = false; StringBuilder fieldBuffer = new StringBuilder(); foreach (char c in inputBuffer) { if (c == FieldDelimiter && !insideQuotes) { output.Add(fieldBuffer.Remove(0, 1).Remove(fieldBuffer.Length - 1, 1).ToString().Trim()); fieldBuffer.Clear(); continue; } else if (c == '\"') insideQuotes = !insideQuotes; fieldBuffer.Append(c); } output.Add(fieldBuffer.Remove(0, 1).Remove(fieldBuffer.Length - 1, 1).ToString().Trim()); return output.ToArray(); } 
0
source

Ok, this is a little strange, but it works.

So, first you split your string into parts based on the character " :

 string msg = "this string should have a comma here,\"but, there should be no comma in this bit\", and there should be a comma back at that and"; var parts = msg.Split('"'); 

then you need to join the line back with the symbol " after removing each comma in each other part:

 string result = string.Join("\"", RemoveCommaFromEveryOther(parts)); 

The delete function looks like this:

 IEnumerable<string> RemoveCommaFromEveryOther(IEnumerable<string> parts) { using (var partenum = parts.GetEnumerator()) { bool replace = false; while (partenum.MoveNext()) { if(replace) { yield return partenum.Current.Replace(",",""); replace = false; } else { yield return partenum.Current; replace = true; } } } } 

To do this, you must enable the use directive for System.Collections.Generic .

0
source

There are many ways to do this: Lok with the string.Split() and string.IndexOfAny() functions

You can use string.Split (new char [] {',', ''}, StringSplitOption.RemoveEmptyEntries) to spell a phrase into words, and then use the StringBuilder class to StringBuilder words.

Calling string.Replace("[char to remove goes here]"',"") several times with each char you want to remove will also work.

EDIT:

Call string.Split(new char[] {'\"'}, StringSplitOption.RemoveEmptyEntries) to get an array of strings between quotation marks ("), then call Replace for each of them, then put the strings along with StringBuilder .

-1
source

All Articles