As you can read in a text file, in which each entry is a paragraph, and each new line denotes a separate field. The complication is that some entries have 4 lines, and some have 6. @DWin beat my questions when the difference in the number of fields is 1, but it all fell apart when there were two. You can find his answer here .
So here is my last start text simulation
TheInstitute 5467 telephone line 4125526987 x 4567 datetime 2011110516 12:56 blay blay blah who knows what, but anyway it may have a comma TheInstitute 5467 telephone line 4125526987 x 4567 datetime 2011110516 12:58 blay blay blah who knows what TheInstitute 5467 telephone line 412552999 x 4999 bump phone line 4125527777 bump pony pony oops 4125527777 datetime 2011110516 12:59 blay blay blah who knows what TheInstitute 5467 telephone line 4125526987 x 4567 bump phone line 4125527777 bump pony pony oops 4125527777 datetime 2011110516 13:51 blay blay blah who knows what, but anyway it may have a comma TheInstitute 5467 telephone line 4125526987 x 4567 datetime 2011110516 14:56 blay blay blah who knows what
This is what the output should look like. This is actually one step away from what I need. I put the ASCII text representation of R data.frame below. You will see that everything is in the data frame, but the field values ββare shifted by two columns, because some records have two additional fields.
structure(list(institution = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "TheInstitute 5467", class = "factor"), telephoneline = structure(c(1L, 1L, 2L, 1L, 1L), .Label = c("telephone line 4125526987 x 4567", "telephone line 412552999 x 4999"), class = "factor"), date.or.bump = structure(c(2L, 3L, 1L, 1L, 4L), .Label = c("bump phone line 4125527777", "datetime 2011110516 12:56", "datetime 2011110516 12:58", "datetime 2011110516 14:56"), class = "factor"), field4 = structure(c(2L, 1L, 3L, 3L, 1L), .Label = c("blay blay blah who knows what", "blay blay blah who knows what, but anyway it may have a comma", "bump pony pony oops 4125527777"), class = "factor"), field5 = structure(c(1L, 1L, 2L, 3L, 1L), .Label = c("", "datetime 2011110516 12:59", "datetime 2011110516 13:51"), class = "factor"), field6 = structure(c(1L, 1L, 2L, 3L, 1L), .Label = c("", "blay blay blah who knows what", "blay blay blah who knows what, but anyway it may have a comma" ), class = "factor")), .Names = c("institution", "telephoneline", "date.or.bump", "field4", "field5", "field6"), class = "data.frame", row.names = c(NA, -5L))
PS: Do I believe that one sends a data frame using dput or can save the .Rdata file here.