Find "one letter that appears twice" in the string

I am trying to catch if one letter appears twice in a line using RegEx (or maybe there are some better ways?), For example, my line:

ugknbfddgicrmopn 

The conclusion will be:

 dd 

However, I tried something like:

 re.findall('[az]{2}', 'ugknbfddgicrmopn') 

but in this case it returns:

 ['ug', 'kn', 'bf', 'dd', 'gi', 'cr', 'mo', 'pn'] # the except output is `['dd']` 



I also have a way to get the result of the wait:

 >>> l = [] >>> tmp = None >>> for i in 'ugknbfddgicrmopn': ... if tmp != i: ... tmp = i ... continue ... l.append(i*2) ... ... >>> l ['dd'] >>> 

But it's too complicated ...




If it is 'abbbcppq' , then only catch:

 abbbcppq ^^ ^^ 

So the output is:

 ['bb', 'pp'] 



Then, if it is 'abbbbcppq' , catch bb twice:

 abbbbcppq ^^^^ ^^ 

So the output is:

 ['bb', 'bb', 'pp'] 
+56
python regex
Dec 14 '15 at 7:00
source share
8 answers

You need to use a regular expression for the binding group and define your regular expression as a raw string.

 >>> re.search(r'([az])\1', 'ugknbfddgicrmopn').group() 'dd' >>> [i+i for i in re.findall(r'([az])\1', 'abbbbcppq')] ['bb', 'bb', 'pp'] 

or

 >>> [i[0] for i in re.findall(r'(([az])\2)', 'abbbbcppq')] ['bb', 'bb', 'pp'] 

Note that re.findall here should return a list of tuples with characters that are matched by the first group as the first element and the second group by the second element. For our case, the characters in the first group would be enough, so I mentioned i[0] .

+50
Dec 14 '15 at 7:13
source share

As a pythonic way, you can use the zip function in understanding the list:

 >>> s = 'abbbcppq' >>> >>> [i+j for i,j in zip(s,s[1:]) if i==j] ['bb', 'bb', 'pp'] 

If you are dealing with a large string, you can use the iter() function to convert the string to an iterator and use itertols.tee() to create two independent iterators, and then calling the next function on the second iterator consumes the first element and use the class call zip (in Python 2.X use itertools.izip() , which returns an iterator) with these iterators.

 >>> from itertools import tee >>> first = iter(s) >>> second, first = tee(first) >>> next(second) 'a' >>> [i+j for i,j in zip(first,second) if i==j] ['bb', 'bb', 'pp'] 

Test with RegEx recipe:

 # ZIP ~ $ python -m timeit --setup "s='abbbcppq'" "[i+j for i,j in zip(s,s[1:]) if i==j]" 1000000 loops, best of 3: 1.56 usec per loop # REGEX ~ $ python -m timeit --setup "s='abbbcppq';import re" "[i[0] for i in re.findall(r'(([az])\2)', 'abbbbcppq')]" 100000 loops, best of 3: 3.21 usec per loop 



After the last edit indicated in the comment, if you want to match only one pair b in lines like "abbbcppq" , you can use finditer() , which returns an iterator of matching objects and retrieves the result using the group() method:

 >>> import re >>> >>> s = "abbbcppq" >>> [item.group(0) for item in re.finditer(r'([az])\1',s,re.I)] ['bb', 'pp'] 

Note that re.I is the IGNORECASE flag, which also makes RegEx capitalized accordingly.

+32
Dec 14 '15 at 7:03
source share

Using a backlink, it is very simple:

 import re p = re.compile(ur'([az])\1{1,}') re.findall(p, u"ugknbfddgicrmopn") #output: [u'd'] re.findall(p,"abbbcppq") #output: ['b', 'p'] 

For more information, you can refer to a similar question in perl: Regular expression to match any character repeated more than 10 times

+9
Dec 14 '15 at 7:08
source share

Perhaps you can use a generator to achieve this.

 def adj(s): last_c = None for c in s: if c == last_c: yield c * 2 last_c = c s = 'ugknbfddgicrmopn' v = [x for x in adj(s)] print(v) # output: ['dd'] 
+4
Dec 14 '15 at 7:14
source share

This is pretty easy without regular expressions:

 In [4]: [k for k, v in collections.Counter("abracadabra").items() if v==2] Out[4]: ['b', 'r'] 
+4
Dec 14 '15 at 11:04
source share
 A1 = "abcdededdssffffccfxx" print A1[1] for i in range(len(A1)-1): if A1[i+1] == A1[i]: if not A1[i+1] == A1[i-1]: print A1[i] *2 
+2
Dec 14 '15 at 7:17
source share

"or maybe there are some better ways"

Since the regex is often misunderstood by the next developer to come across your code (maybe even you), And since it's easier! = In short,

What about the following pseudo code:

 function findMultipleLetters(inputString) { foreach (letter in inputString) { dictionaryOfLettersOccurrance[letter]++; if (dictionaryOfLettersOccurrance[letter] == 2) { multipleLetters.add(letter); } } return multipleLetters; } multipleLetters = findMultipleLetters("ugknbfddgicrmopn"); 
+2
Dec 14 '15 at 7:48
source share
 >>> l = ['ug', 'kn', 'bf', 'dd', 'gi', 'cr', 'mo', 'pn'] >>> import re >>> newList = [item for item in l if re.search(r"([az]{1})\1", item)] >>> newList ['dd'] 
0
Dec 14 '15 at 7:13
source share



All Articles