Replace duplicate characters with one with regular expression

I need a regex script to remove double repetition for these specific words. If these characters replace it with a single.

/[\s.'-,{2,0}] 

This is a symbol that, if they come, I need to replace it with the same symbol.

+7
source share
4 answers

Is this the regular expression you're looking for?

 /([\s.'-,])\1+/ 

OK, now that will fit. If you use Perl, you can replace it using the following expression:

 s/([\s.'-,])\1+/$1/g 

Edit: if you use: um: PHP, then you should use this syntax:

 $out = preg_replace('/([\s.\'-,])\1+/', '$1', $in); 

The group () matches the character, and \1 means that the same thing that it just matched in parentheses happens at least once more. In the replacement, $1 refers to a match in the first set of parentheses.

Note. This is Perl compatible regular expression syntax (PCRE).

On the perlretut page:

Repetition Matching

The examples in the previous section demonstrate annoying weakness. We used only 3-letter words or pieces of words of 4 letters or less. We would like to be able to match words or, in a more general sense, strings of any length without writing out tedious alternatives such as \w\w\w\w|\w\w\w|\w\w|\w .

It was in this matter that the quantizer metacharacters were created ? , * , + and {} . They allow us to limit the number of repetitions for part of the regular expression, which we consider a coincidence. Quantifiers are placed immediately after the character, character class, or grouping we want to specify. They have the following meanings:

  • a? means: match 'a' 1 or 0 times

  • a* means: match 'a' 0 or more times, i.e. any number of times

  • a+ means: match 'a' 1 or more times, i.e. at least once

  • a{n,m} means: match at least "n" times, but no more than "m" times.

  • a{n,} means: match at least "n" or more

  • a{n} means: match exactly "n" times

+11
source

As others have said, it depends on your regex engine, but a small example of how you could do this: /([ _-,.])\1*/\1/g

With sed:

 $ echo "foo , bar" | sed 's/\([ _-,.]\)\1*/\1/g' foo , bar $ echo "foo,. bar" | sed 's/\([ _-,.]\)\1*/\1/g' foo,. bar 
+1
source

Using the Javascript mentioned in commennt, and assuming (not too clear from your question), the characters you want to replace are spaces,. , ' , - and,:

 var str = 'a b....,,'; str = str.replace(/(\s){2}|(\.){2}|('){2}|(-){2}|(,){2}/g, '$1$2$3$4$5'); // Now str === 'a b..,' 
0
source

If I understand correctly, you want to do the following: a given set of characters, replace any multiple occurrence of each of them with one character. Here is how I will do it in perl:

 perl -pi.bak -e "s/\.{2,}/\./g; s/\-{2,}/\-/g; s/'{2,}/'/g" text.txt 

If, for example, text.txt initially contains:

Here. and here are 2 that should be the only ones. Here is also a double that should become one. Finally, here we have three '' that should be replaced by one.

changes as follows:

Here. and here 2. which should become one. Here is also a double that should become one. Finally, here we have three ', which should be replaced by one.

I just use the same regular expression for every character in the set: for example

 s/\.{2,}/\./g; 

replaces 2 or more occurrences of a dot character with a single dot. I concatenate several of these expressions, one for each character in your source set.

There may be more compact ways to do this, but I think it just works :)

Hope this helps.

0
source

All Articles