"last name, first name" & # 8594; "last name" in serialized lines

I have a group of strings containing lists of names in the format last name, first name , separated by commas, for example:

names <- c('Beaufoy, Simon, Boyle, Danny','Nolan, Christopher','Blumberg, Stuart, Cholodenko, Lisa','Seidler, David','Sorkin, Aaron') 

What is the easiest way to convert all of these names to strings in the surname name format?

+6
source share
3 answers

If you can be sure that the comma will not be in the person’s name, this may work:

 mynames <- c('Beaufoy, Simon, Boyle, Danny', 'Nolan, Christopher', 'Blumberg, Stuart, Cholodenko, Lisa', 'Seidler, David', 'Sorkin, Aaron', 'Hoover, J. Edgar') mynames2 <- strsplit(mynames, ", ") unlist(lapply(mynames2, function(x) paste(x[1:length(x) %% 2 == 0], x[1:length(x) %% 2 != 0]))) # [1] "Simon Beaufoy" "Danny Boyle" "Christopher Nolan" # [4] "Stuart Blumberg" "Lisa Cholodenko" "David Seidler" # [7] "Aaron Sorkin" "J. Edgar Hoover" 

I added J. Edgar Hoover there for a good measure.

If you want the names that you specify together to remain together, add collapse = ", " to your paste() function:

 unlist(lapply(mynames2, function(x) paste(x[1:length(x) %% 2 == 0], x[1:length(x) %% 2 != 0], collapse = ", "))) # [1] "Simon Beaufoy, Danny Boyle" "Christopher Nolan" # [3] "Stuart Blumberg, Lisa Cholodenko" "David Seidler" # [5] "Aaron Sorkin" "J. Edgar Hoover" 
+9
source

(1) Maintain the same name in each element . This can be done with a single gsub (if there are no commas in the names):

 > gsub("([^, ][^,]*), ([^,]+)", "\\2 \\1", names) [1] "Simon Beaufoy, Danny Boyle" "Christopher Nolan" [3] "Stuart Blumberg, Lisa Cholodenko" "David Seidler" [5] "Aaron Sorkin" > gsub("([^, ][^,]*), ([^,]+)", "\\2 \\1", "Hoover, J. Edgar") [1] "J. Edgar Hoover" 

(2) Divide by one name for each item . If you need each last name of the name in a separate element, use (a) scan

 scan(text = out, sep = ",", what = "") 

where out is the result of the above gsub or for its direct use. (b) bind :

 > library(gsubfn) > strapply(names, "([^, ][^,]*), ([^,]+)", x + y ~ paste(y, x), simplify = c) [1] "Simon Beaufoy" "Danny Boyle" "Christopher Nolan" [4] "Stuart Blumberg" "Lisa Cholodenko" "David Seidler" [7] "Aaron Sorkin" > strapply("Hoover, Edgar J.", "([^, ][^,]*), ([^,]+)", x + y ~ paste(y, x), + simplify = c) [1] "Edgar J. Hoover" 

Note that all the examples above used the same regular expression for matching.

UPDATE: The remote comma separating the first and last name.

UPDATE: code has been added to split the name of each surname into a separate element in the event that this is the preferred output format.

+3
source

I am a supporter of @AnandaMahto's Answer, but for fun this illustrates another method using scan , split and rapply .

 names <- c(names, 'Chambers, John, Ihaka, Ross, Gentleman, Robert') # extract names snames <- lapply(names, function(x) scan(text=x, what='', sep=',', strip.white=TRUE, quiet=TRUE)) # break up names snames<-lapply(snames, function(x) split(x, rep(seq(length(x) %/% 2), each=2))) # collapse together, reversed rapply(snames, function(x) paste(x[2:1], collapse=' ')) 
+1
source

All Articles