Incorrect format subset dates in R

I have personal data in which participants entered the date of their birth in a variety of formats:

ID <- c(101,102,103,104,105,106,107) dob <- c("20/04/2001","29/10/2000","September 1 2012","15/11/00","20.01.1999","April 20th 1999", "04/08/01") df <- data.frame(ID, dob) 

Before doing any analysis, I need to be able to multiply the data when it is not in the correct format (i.e. dd / mm / year), and then manually correct each cell.

I tried using:

 df$dob <- strptime(dob, "%d/%m/%Y") 

... to indicate which of my dates was in the correct format, but I just get NA for dates that are entered incorrectly, which does not help if I want to change them later (using the identifier as a reference).

Does anyone have any ideas that can help me?

+1
date r date-formatting format-conversion
source share
2 answers

Check out the lubridate package.

 library(lubridate) parse_date_time(dob, c("dmy", "Bdy")) # [1] "2001-04-20 UTC" "2000-10-29 UTC" "2012-09-01 UTC" "0000-11-15 UTC" "1999-01-20 UTC" # [6] "1999-04-20 UTC" "0001-08-04 UTC" 
+3
source share

Disclaimer: I'm not sure if I fully understood your question.

In the snippet below, dob2 will contain a date or NA based on whether the dob is in the correct format. You should be able to filter for is.na (dob2) to get invalid data. Please note that 04/03/2013 can be interpreted as March 3 or April 4, but it looks like you are assuming it will be April 3, so I went with it.

 ID <- c(101,102,103,104,105,106,107) dob <- c("20/04/2001","29/10/2000","September 1 2012","15/11/00","20.01.1999","April 20th 1999", "04/08/01") df <- data.table(ID, dob) df[,dob2 := as.Date(dob, "%d/%m/%Y")] 

EDIT - added output. By the way, you could do something like df[is.na(as.Date(dob, "%d/%m/%Y"))]

  ID dob dob2 1: 101 20/04/2001 2001-04-20 2: 102 29/10/2000 2000-10-29 3: 103 September 1 2012 <NA> 4: 104 15/11/00 0000-11-15 5: 105 20.01.1999 <NA> 6: 106 April 20th 1999 <NA> 7: 107 04/08/01 0001-08-04 
-one
source share

All Articles