Regex - return name and surname

I am looking for the best reliable way to return the first and last name of a person with a full name , as far as I can imagine, this is the following regular expression:

$name = preg_replace('~\b(\p{L}+)\b.+\b(\p{L}+)\b~i', '$1 $2', $name); 

The expected output should look something like this:

 William -> William // Regex Fails William Henry -> William Henry William Henry Gates -> William Gates 

I also want to support accents like JoΓ£o.

EDIT: I understand that some names will not be correctly identified, but for me this is not a problem since it will be used on the local site where the last word is the last name (maybe not be the whole last name though), but this is not a problem. because all I want is a quick way to say "Dear FIRST_NAME LAST_NAME" .... So this whole discussion, while completely valid, is useless to me.

Can someone help me?

+2
source share
6 answers

Like you, you need a surname - this, of course, your first example does not have.

Use cluster grouping, (?:...) and 0-or-1 count ? , for the middle and last name as a whole, so that they are optional:

 '~\b(\p{L}+)\b (?: .+\b(\p{L}+)\b )?~ix' # x for spacing 

This should allow the first name to be fixed regardless of whether the first / last name is indicated or not.

 $name = preg_replace('~\b(\p{L}+)\b(?:.+\b(\p{L}+)\b)?~i', '$1 $2', $name); 
+2
source

This may not be what you want to hear, but I don’t think this problem is suitable for regex, as names are not regular. I do not think that they are even context-sensitive or context-free. In any case, they are unlimited (I would have to sit down and think that more than me before I say this for sure), and no regular expression mechanism can parse unlimited grammar.

+7
source

Instead of regex, it might be easier for you to do something like:

 $parts = explode(" ", $name); $first = $parts[0]; $last = "" if (count($parts) > 1) { $last = $parts[count($parts) - 1]; } 

You might want to first replace several consecutive space bits with one space so that you don't get empty bits and get rid of trailing / leading spaces:

 $name = ereg_replace("[ \t\r\n]+", " ", trim($name)); 
+6
source

Depending on how clean your data is, I think it will be difficult for you to find one regular expression that will do what you want. In what different formats do you expect the names to be? I had to write a similar code, and there can be many options: - first last - last first - first middle last - last, first middle

And you have things like suffixes (junior, oldest, third, etc.) and prefixes (Mr., Mrs., etc.), combined names (for example, John and Mary Smith). As mentioned earlier, you also have to deal with multi-part last names (e.g. Victor de la Hoya).

I found that I had to deal with all of these possibilities before I could reliably pull out the first and last name.

+2
source

If you define the first and last name as text before the first space and after the last space, just divide the line into spaces and take the first and last elements of the array.

However, depending on the context / scope of what you are doing, you may need to reevaluate things - not all names around the world will fit this pattern.

+1
source

I think your best option is to simply treat everything after the first name as the last name ie

William Henry Gates
Name: William
Surname: Henry Gates

Its the safest mechanism, since not everyone will be included in their middle name. You cannot just extract William β€” ignore Henry β€” and extract Gates, as with everything you know, Henry is part of the Last Name.

+1
source

All Articles