How can I remove punctuation from a string?

For part of the answer to the question “I hope you have an answer for 30 seconds” I am specifically looking for C #

But in the general case, what's the best way to pull down punctuation in any language?

I have to add: Ideally, solutions will not require you to list all possible punctuation marks.

Related: Punctuation Stripes in Python

+55
string c #
Jan 07 '09 at 19:05
source share
14 answers
new string(myCharCollection.Where(c => !char.IsPunctuation(c)).ToArray()); 
+84
Jan 07 '09 at 19:09
source share

Why not just:

 string s = "sxrdct? fvzguh, bij.";
 var sb = new StringBuilder ();

 foreach (char c in s)
 {
    if (! char.IsPunctuation (c))
       sb.Append (c);
 }

 s = sb.ToString ();

Using RegEx is usually slower than simple char operations. And these LINQ operations seem superfluous to me. And you cannot use such code in .NET 2.0 ...

+17
Jan 07 '09 at 19:51
source share

Assuming “best” means “simplest,” I suggest using something like this:

 String stripped = input.replaceAll("\\p{Punct}+", ""); 

This example is for Java, but all fairly modern Regex mechanisms should support this (or something similar).

Edit: The Unicode-Aware version will be as follows:

 String stripped = input.replaceAll("\\p{P}+", ""); 

In the first version, only punctuation characters contained in ASCII are considered.

+13
Jan 07 '09 at 19:09
source share

Describes the intention, easiest to read (IMHO) and the best:

  s = s.StripPunctuation(); 

for implementation:

 public static class StringExtension { public static string StripPunctuation(this string s) { var sb = new StringBuilder(); foreach (char c in s) { if (!char.IsPunctuation(c)) sb.Append(c); } return sb.ToString(); } } 

This uses the Hades32 algorithm that best performed the published bundle.

+9
Jun 17 '10 at 16:57
source share

You can use the regex.replace method:

  replace(YourString, RegularExpressionWithPunctuationMarks, Empty String) 

Since this returns a string, your method will look something like this:

  string s = Regex.Replace("Hello!?!?!?!", "[?!]", ""); 

You can replace "[?!]" With something more sophisticated if you want:

 (\p{P}) 

This should find any punctuation.

+8
Jan 07 '09 at 19:12
source share

This thread is so old, but I would refuse to post a more elegant solution (IMO).

 string inputSansPunc = input.Where(c => !char.IsPunctuation(c)).Aggregate("", (current, c) => current + c); 

This is LINQ sans WTF.

+6
Sep 29 '11 at 13:26
source share

Based on the idea of ​​GWLlosa, I was able to come up with an extremely ugly but working:

 string s = "cat!"; s = s.ToCharArray().ToList<char>() .Where<char>(x => !char.IsPunctuation(x)) .Aggregate<char, string>(string.Empty, new Func<string, char, string>( delegate(string s, char c) { return s + c; })); 
+4
Jan 07 '09 at 19:23
source share

The easiest way to do this would be to use string.replace

Another way that I represent is regex.replace and has a regex with all the appropriate punctuation marks.

+3
Jan 07 '09 at 19:08
source share

Here's a slightly different approach using linq. I like AviewAnew, but this avoids aggregation

  string myStr = "Hello there..';,]';';., Get rid of Punction"; var s = from ch in myStr where !Char.IsPunctuation(ch) select ch; var bytes = UnicodeEncoding.ASCII.GetBytes(s.ToArray()); var stringResult = UnicodeEncoding.ASCII.GetString(bytes); 
+1
Jan 07 '09 at 19:39
source share
 $newstr=ereg_replace("[[:punct:]]",'',$oldstr); 
+1
Dec 14 '10 at 11:42
source share

I ran into the same problem and was concerned about the impact performance of the IsPunctuation call for each individual check.

I found this post: http://www.dotnetperls.com/char-ispunctuation .

Line by line: char.IsPunctuation also handles Unicode on top of ASCII. The method corresponds to a character set, including control characters. Definitely, this method is hard and expensive.

The bottom line is that I, in the end, did not go for it because of the impact of its performance on my ETL process.

I went for the usual dotnetperls implementation.

And jut FYI, here is the code deduced from previous answers to get a list of all punctuation marks (excluding control ones):

 var punctuationCharacters = new List<char>(); for (int i = char.MinValue; i <= char.MaxValue; i++) { var character = Convert.ToChar(i); if (char.IsPunctuation(character) && !char.IsControl(character)) { punctuationCharacters.Add(character); } } var commaSeparatedValueOfPunctuationCharacters = string.Join("", punctuationCharacters); Console.WriteLine(commaSeparatedValueOfPunctuationCharacters); 

Cheers, Andrew

+1
Apr 18 '15 at 22:05
source share

If you want to use this for tokenizing text, you can use:

 new string(myText.Select(c => char.IsPunctuation(c) ? ' ' : c).ToArray()) 
+1
Apr 05 '16 at 20:44
source share
 #include<string> #include<cctype> using namespace std; int main(int a, char* b[]){ string strOne = "H,el/l!o W#o@r^l&d!!!"; int punct_count = 0; cout<<"before : "<<strOne<<endl; for(string::size_type ix = 0 ;ix < strOne.size();++ix) { if(ispunct(strOne[ix])) { ++punct_count; strOne.erase(ix,1); ix--; }//if } cout<<"after : "<<strOne<<endl; return 0; }//main 
0
May 11 '09 at 3:09 a.m.
source share

For long lines, I use this:

 var normalized = input .Where(c => !char.IsPunctuation(c)) .Aggregate(new StringBuilder(), (current, next) => current.Append(next), sb => sb.ToString()); 

works much better than using string concatenations (although I agree that it is less intuitive).

0
Sep 03
source share



All Articles