Python Beginners: Regex & Phone Numbers

Question

Python Beginners: Regex & Phone Numbers

I work through a beginner in a Python book and there are two fairly simple things that I don’t understand, and I hoped that someone here could help.

In the example in the book, regular expressions are used to enter email addresses and phone numbers from the clipboard and display them on the console. The code is as follows:

#! python3
# phoneAndEmail.py - Finds phone numbers and email addresses on the clipboard.

import pyperclip, re

# Create phone regex.
phoneRegex = re.compile(r'''(
(\d{3}|\(\d{3}\))?              #[1] area code
(\s|-|\.)?                      #[2] separator
(\d{3})                         #[3] first 3 digits
(\s|-|\.)                       #[4] separator
(\d{4})                         #[5] last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))?  #[6] extension
)''', re.VERBOSE)

# Create email regex.
emailRegex = re.compile(r'''(
[a-zA-Z0-9._%+-]+   
@                   
[\.[a-zA-Z0-9.-]+   
(\.[a-zA-Z]{2,4})   
)''', re.VERBOSE)

# Find matches in clipboard text.
text = str(pyperclip.paste())           
matches = []                             

for groups in phoneRegex.findall(text):  
    phoneNum = '-'.join([groups[1], groups[3], groups[5]])
    if groups [8] != '':
        phoneNum += ' x' + groups[8]
    matches.append(phoneNum)

for groups in emailRegex.findall(text):
    matches.append(groups[0])           

# Copy results to the clipboard.
if len(matches) > 0:                    
    pyperclip.copy('\n'.join(matches))
    print('Copied to Clipboard:')
    print('\n'.join(matches))
else:
    print('No phone numbers of email addresses found')

Well, firstly, I really don't understand the phoneRegex object. The book mentions that adding parentheses will create groups in the regular expression.

If this is the case, my expected index values in the comments are incorrect and should there really be two groups in the index marked one? Or, if they are correct, what groups [7,8] are referenced in the correspondence cycle below for phone numbers?

-, emailRegex , phoneRegex ?

1

, . , - . , rock321987, , , sweaver2112?

2

, .

+4

python regex

rsylatian 24 '16 19:43

3

(                               #[1] around whole pattern
(\d{3}|\(\d{3}\))?              #[2] area code
(\s|-|\.)?                      #[3] separator
(\d{3})                         #[4] first 3 digits
(\s|-|\.)                       #[5] separator
(\d{4})                         #[6] last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))?  #[7] extension
    <---------->   <------->
      ^^               ^^
      ||               ||
      [8]              [9]
)

. python . regex

[] ( )
() ( )

, , , list tuple python. (), [] ..

+3

rock321987 24 '16 19:48

. sweaver2112

for the second part, both usage lists and tuples. In Regex \ d, this is the same as [0-9], it is just easier to write. in the same vein, they could write \ w for [a-zA-Z], but that wouldn't take into account special characters or 0-9, which would make it a little easier to put [a-zA-Z0-9.-]

+2

Jinjubei May 24 '16 at 19:56

source share

sweaver2112 · Accepted Answer · 2016-05-24T19:48:34+0000

( , :

(                               #[1] around whole pattern
(\d{3}|\(\d{3}\))?              #[2] area code
(\s|-|\.)?                      #[3] separator
(\d{3})                         #[4] first 3 digits
(\s|-|\.)                       #[5] separator
(\d{4})                         #[6] last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))?  #[7,8,9] extension
)

named groups (?<groupname>pattern), parens (?:pattern), . , , :

(?<areacode>(?:\d{3}|\(\d{3}\))?)
(?<separator>(?:\s|-|\.)?)
(?<exchange>\d{3})
(?<separator2>\s|-|\.)
(?<lastfour>\d{4})
(?<extension>(?:\s*(?:ext|x|ext.)\s*(?:\d{2,5}))?)

Python Beginners: Regex & Phone Numbers

More articles: