Find "one letter that appears twice" in the string

Question

Find "one letter that appears twice" in the string

I am trying to catch if one letter appears twice in a line using RegEx (or maybe there are some better ways?), For example, my line:

ugknbfddgicrmopn

The conclusion will be:

dd

However, I tried something like:

 re.findall('[az]{2}', 'ugknbfddgicrmopn')

but in this case it returns:

 ['ug', 'kn', 'bf', 'dd', 'gi', 'cr', 'mo', 'pn'] # the except output is `['dd']`

I also have a way to get the result of the wait:

 >>> l = [] >>> tmp = None >>> for i in 'ugknbfddgicrmopn': ... if tmp != i: ... tmp = i ... continue ... l.append(i*2) ... ... >>> l ['dd'] >>>

But it's too complicated ...

If it is 'abbbcppq' , then only catch:

 abbbcppq ^^ ^^

So the output is:

 ['bb', 'pp']

Then, if it is 'abbbbcppq' , catch bb twice:

 abbbbcppq ^^^^ ^^

So the output is:

 ['bb', 'bb', 'pp']

+56

python regex

Kevin Guan Dec 14 '15 at 7:00

source share

8 answers

As a pythonic way, you can use the zip function in understanding the list:

 >>> s = 'abbbcppq' >>> >>> [i+j for i,j in zip(s,s[1:]) if i==j] ['bb', 'bb', 'pp']

If you are dealing with a large string, you can use the iter() function to convert the string to an iterator and use itertols.tee() to create two independent iterators, and then calling the next function on the second iterator consumes the first element and use the class call zip (in Python 2.X use itertools.izip() , which returns an iterator) with these iterators.

 >>> from itertools import tee >>> first = iter(s) >>> second, first = tee(first) >>> next(second) 'a' >>> [i+j for i,j in zip(first,second) if i==j] ['bb', 'bb', 'pp']

Test with `RegEx` recipe:

 # ZIP ~ $ python -m timeit --setup "s='abbbcppq'" "[i+j for i,j in zip(s,s[1:]) if i==j]" 1000000 loops, best of 3: 1.56 usec per loop # REGEX ~ $ python -m timeit --setup "s='abbbcppq';import re" "[i[0] for i in re.findall(r'(([az])\2)', 'abbbbcppq')]" 100000 loops, best of 3: 3.21 usec per loop

After the last edit indicated in the comment, if you want to match only one pair b in lines like "abbbcppq" , you can use finditer() , which returns an iterator of matching objects and retrieves the result using the group() method:

 >>> import re >>> >>> s = "abbbcppq" >>> [item.group(0) for item in re.finditer(r'([az])\1',s,re.I)] ['bb', 'pp']

Note that re.I is the IGNORECASE flag, which also makes RegEx capitalized accordingly.

+32

Kasramvd Dec 14 '15 at 7:03

source share

Using a backlink, it is very simple:

 import re p = re.compile(ur'([az])\1{1,}') re.findall(p, u"ugknbfddgicrmopn") #output: [u'd'] re.findall(p,"abbbcppq") #output: ['b', 'p']

For more information, you can refer to a similar question in perl: Regular expression to match any character repeated more than 10 times

+9

Gurupad Hegde Dec 14 '15 at 7:08

source share

Perhaps you can use a generator to achieve this.

 def adj(s): last_c = None for c in s: if c == last_c: yield c * 2 last_c = c s = 'ugknbfddgicrmopn' v = [x for x in adj(s)] print(v) # output: ['dd']

+4

xhg Dec 14 '15 at 7:14

source share

This is pretty easy without regular expressions:

 In [4]: [k for k, v in collections.Counter("abracadabra").items() if v==2] Out[4]: ['b', 'r']

+4

Dima Tisnek Dec 14 '15 at 11:04

source share

 A1 = "abcdededdssffffccfxx" print A1[1] for i in range(len(A1)-1): if A1[i+1] == A1[i]: if not A1[i+1] == A1[i-1]: print A1[i] *2

+2

Mark White Dec 14 '15 at 7:17

source share

"or maybe there are some better ways"

Since the regex is often misunderstood by the next developer to come across your code (maybe even you), And since it's easier! = In short,

What about the following pseudo code:

 function findMultipleLetters(inputString) { foreach (letter in inputString) { dictionaryOfLettersOccurrance[letter]++; if (dictionaryOfLettersOccurrance[letter] == 2) { multipleLetters.add(letter); } } return multipleLetters; } multipleLetters = findMultipleLetters("ugknbfddgicrmopn");

+2

Lavi Avigdor Dec 14 '15 at 7:48

source share

 >>> l = ['ug', 'kn', 'bf', 'dd', 'gi', 'cr', 'mo', 'pn'] >>> import re >>> newList = [item for item in l if re.search(r"([az]{1})\1", item)] >>> newList ['dd']

0

Mayur Koshti Dec 14 '15 at 7:13

source share

Avinash Raj · Accepted Answer · 2015-12-14 07:13

You need to use a regular expression for the binding group and define your regular expression as a raw string.

 >>> re.search(r'([az])\1', 'ugknbfddgicrmopn').group() 'dd' >>> [i+i for i in re.findall(r'([az])\1', 'abbbbcppq')] ['bb', 'bb', 'pp']

or

 >>> [i[0] for i in re.findall(r'(([az])\2)', 'abbbbcppq')] ['bb', 'bb', 'pp']

Note that re.findall here should return a list of tuples with characters that are matched by the first group as the first element and the second group by the second element. For our case, the characters in the first group would be enough, so I mentioned i[0] .

Find "one letter that appears twice" in the string

Test with RegEx recipe:

More articles:

Test with `RegEx` recipe: