I have a data frame as shown below. This is an example of data set with homogeneous patterns, but the whole data is not very homogeneous:
locationid address
1073744023 525 East 68th Street, New York, NY 10065, USA
1073744022 270 Park Avenue, New York, NY 10017, USA
1073744025 Rockefeller Center, 50 Rockefeller Plaza, New York, NY 10020, USA
1073744024 1251 Avenue of the Americas, New York, NY 10020, USA
1073744021 1301 Avenue of the Americas, New York, NY 10019, USA
1073744026 44 West 45th Street, New York, NY 10036, USA
I need to find the city and country name at this address. I tried the following:
1) strsplit
This gives me a list, but I cannot access the last or third last item from this.
2) Regular expressions
find the country easily
str_sub(str_extract(address, "\\d{5},\\s.*"),8,11)
but for the city
str_sub(str_extract(address, ",\\s.+,\\s.+\\d{5}"),3,comma_pos)
I can not find comma_pos, as this again leads me to the same problem. I believe that there is a more effective way to solve this problem using any of the above methods.