Names in the form: Ceasar, Julius must be separated by First_name Julius Surname Ceasar.
Names may contain diacritics (à é ..) and ligatures (æ, ø)
This code works fine in Python 3.3
import re def doesmatch(pat, str): try: yup = re.search(pat, str) print('Firstname {0} lastname {1}'.format(yup.group(2), yup.group(1))) except AttributeError: print('no match for {0}'.format(str)) s = 'Révèrberë, Harry' t = 'Åapö, Renée' u = 'C3po, Robby' v = 'Mærsk, Efraïm' w = 'MacDønald, Ron' x = 'Sträßle, Mpopo' pat = r'^([^\d\s]+), ([^\d\s]+)'
Everything except matching u. (there are no matches for the numbers in the names), but I wonder if there is a better way than the non-discrete non-space approach. More important though: I would like to clarify the pattern so that it distinguishes capitals from lowercase letters, but including metropolitan diacritics and ligatures, preferably also using a regular expression. As if ([AZ] [az] +) would match accented and combined characters.
Is it possible?
(what I have looked so far: Dive into python 3 on UTF-8 and Unicode ; This Unicode regex tutorial (which I don't use); I think I don't need a new regex , but I admit that did not read all his documentation)
Rolfbly
source share