Replace duplicate characters with one with regular expression

Question

Replace duplicate characters with one with regular expression

I need a regex script to remove double repetition for these specific words. If these characters replace it with a single.

/[\s.'-,{2,0}]

This is a symbol that, if they come, I need to replace it with the same symbol.

+7

regex

Aditii Aug 24 '11 at 8:12

source share

4 answers

amphetamachine · Answer 1 · 2011-08-24T08:14:17+0000

Is this the regular expression you're looking for?

 /([\s.'-,])\1+/

OK, now that will fit. If you use Perl, you can replace it using the following expression:

 s/([\s.'-,])\1+/$1/g

Edit: if you use: um: PHP, then you should use this syntax:

 $out = preg_replace('/([\s.\'-,])\1+/', '$1', $in);

The group () matches the character, and \1 means that the same thing that it just matched in parentheses happens at least once more. In the replacement, $1 refers to a match in the first set of parentheses.

Note. This is Perl compatible regular expression syntax (PCRE).

On the perlretut page:

Repetition Matching

The examples in the previous section demonstrate annoying weakness. We used only 3-letter words or pieces of words of 4 letters or less. We would like to be able to match words or, in a more general sense, strings of any length without writing out tedious alternatives such as \w\w\w\w|\w\w\w|\w\w|\w .

It was in this matter that the quantizer metacharacters were created ? , * , + and {} . They allow us to limit the number of repetitions for part of the regular expression, which we consider a coincidence. Quantifiers are placed immediately after the character, character class, or grouping we want to specify. They have the following meanings:

a? means: match 'a' 1 or 0 times
a* means: match 'a' 0 or more times, i.e. any number of times
a+ means: match 'a' 1 or more times, i.e. at least once
a{n,m} means: match at least "n" times, but no more than "m" times.
a{n,} means: match at least "n" or more
a{n} means: match exactly "n" times

Ulrich dangel · Answer 2 · 2011-08-24T08:25:39+0000

As others have said, it depends on your regex engine, but a small example of how you could do this: /([ _-,.])\1*/\1/g

With sed:

 $ echo "foo , bar" | sed 's/\([ _-,.]\)\1*/\1/g' foo , bar $ echo "foo,. bar" | sed 's/\([ _-,.]\)\1*/\1/g' foo,. bar

Paulpro · Answer 3 · 2011-08-24T08:26:11+0000

Using the Javascript mentioned in commennt, and assuming (not too clear from your question), the characters you want to replace are spaces,. , ' , - and,:

 var str = 'a b....,,'; str = str.replace(/(\s){2}|(\.){2}|('){2}|(-){2}|(,){2}/g, '$1$2$3$4$5'); // Now str === 'a b..,'

MarcoS · Answer 4 · 2011-08-24T08:40:10+0000

If I understand correctly, you want to do the following: a given set of characters, replace any multiple occurrence of each of them with one character. Here is how I will do it in perl:

 perl -pi.bak -e "s/\.{2,}/\./g; s/\-{2,}/\-/g; s/'{2,}/'/g" text.txt

If, for example, text.txt initially contains:

Here. and here are 2 that should be the only ones. Here is also a double that should become one. Finally, here we have three '' that should be replaced by one.

changes as follows:

Here. and here 2. which should become one. Here is also a double that should become one. Finally, here we have three ', which should be replaced by one.

I just use the same regular expression for every character in the set: for example

 s/\.{2,}/\./g;

replaces 2 or more occurrences of a dot character with a single dot. I concatenate several of these expressions, one for each character in your source set.

There may be more compact ways to do this, but I think it just works :)

Hope this helps.

Replace duplicate characters with one with regular expression

More articles: