Python Beginners: Regex & Phone Numbers

I work through a beginner in a Python book and there are two fairly simple things that I don’t understand, and I hoped that someone here could help.

In the example in the book, regular expressions are used to enter email addresses and phone numbers from the clipboard and display them on the console. The code is as follows:

#! python3
# phoneAndEmail.py - Finds phone numbers and email addresses on the clipboard.

import pyperclip, re

# Create phone regex.
phoneRegex = re.compile(r'''(
(\d{3}|\(\d{3}\))?              #[1] area code
(\s|-|\.)?                      #[2] separator
(\d{3})                         #[3] first 3 digits
(\s|-|\.)                       #[4] separator
(\d{4})                         #[5] last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))?  #[6] extension
)''', re.VERBOSE)

# Create email regex.
emailRegex = re.compile(r'''(
[a-zA-Z0-9._%+-]+   
@                   
[\.[a-zA-Z0-9.-]+   
(\.[a-zA-Z]{2,4})   
)''', re.VERBOSE)

# Find matches in clipboard text.
text = str(pyperclip.paste())           
matches = []                             

for groups in phoneRegex.findall(text):  
    phoneNum = '-'.join([groups[1], groups[3], groups[5]])
    if groups [8] != '':
        phoneNum += ' x' + groups[8]
    matches.append(phoneNum)

for groups in emailRegex.findall(text):
    matches.append(groups[0])           

# Copy results to the clipboard.
if len(matches) > 0:                    
    pyperclip.copy('\n'.join(matches))
    print('Copied to Clipboard:')
    print('\n'.join(matches))
else:
    print('No phone numbers of email addresses found')

Well, firstly, I really don't understand the phoneRegex object. The book mentions that adding parentheses will create groups in the regular expression.

If this is the case, my expected index values ​​in the comments are incorrect and should there really be two groups in the index marked one? Or, if they are correct, what groups [7,8] are referenced in the correspondence cycle below for phone numbers?

-, emailRegex , phoneRegex ?

1

, . , - . , rock321987, , , sweaver2112?

2

, .

+4
3

( , :

(                               #[1] around whole pattern
(\d{3}|\(\d{3}\))?              #[2] area code
(\s|-|\.)?                      #[3] separator
(\d{3})                         #[4] first 3 digits
(\s|-|\.)                       #[5] separator
(\d{4})                         #[6] last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))?  #[7,8,9] extension
)

named groups (?<groupname>pattern), parens (?:pattern), . , , :

(?<areacode>(?:\d{3}|\(\d{3}\))?)
(?<separator>(?:\s|-|\.)?)
(?<exchange>\d{3})
(?<separator2>\s|-|\.)
(?<lastfour>\d{4})
(?<extension>(?:\s*(?:ext|x|ext.)\s*(?:\d{2,5}))?)
+4
(                               #[1] around whole pattern
(\d{3}|\(\d{3}\))?              #[2] area code
(\s|-|\.)?                      #[3] separator
(\d{3})                         #[4] first 3 digits
(\s|-|\.)                       #[5] separator
(\d{4})                         #[6] last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))?  #[7] extension
    <---------->   <------->
      ^^               ^^
      ||               ||
      [8]              [9]
)

. python . regex

[] ( )

() ( )

, , , list tuple python. (), [] ..

+3

. sweaver2112

for the second part, both usage lists and tuples. In Regex \ d, this is the same as [0-9], it is just easier to write. in the same vein, they could write \ w for [a-zA-Z], but that wouldn't take into account special characters or 0-9, which would make it a little easier to put [a-zA-Z0-9.-]

+2
source

All Articles