Reading in specific pattern-matching lines from a file

Question

Reading in specific pattern-matching lines from a file

I have a tab delimited file and contains several tables, each of which heads the heading, for example, "Azuay \ n", "Bolivar \ n", "Cotopaxi \ n", etc., and each table is separated by two line breaks. Within R, as I can read in this file, and select only the table (ie, the specified lines) corresponding to, for example, Bolivar, ignoring the table called Cotopaxi and the table above corresponding to Azway.

NB. I would prefer not to modify the table outside of R.

The data is as follows. The file is divided into a tab.

 Azuay
 region begin       stop
 1A     2017761     148749885
 1A     148863885   150111299
 1A     150329391   150346152
 1A     150432847   247191037


 Bolivar
 region begin           stop 
 2A     2785            242068364
 2A     736640          198339289


 Cotopaxi
 region begin           stop 
 4A     2282            9951846
 4A     11672561        11906166

+5

r data processing

Kaleb Apr 29 '12 at 17:53

source share

1 answer

flodel · Accepted Answer · 2012-04-29T19:27:20+0000

This seems to do the job:

read.entry.table <- function(file, entry) {

   lines <- readLines(file)

   table.entry <- lines == entry
   if (sum(table.entry) != 1) stop(paste(entry, "not found"))

   empty.lines <- which(lines == "")
   empty.lines <- c(empty.lines, length(lines) + 1L)

   table.start <- which(table.entry) + 1L
   table.end   <- empty.lines[which(empty.lines > table.start)[1]] - 1L

   return(read.table(textConnection(lines[seq(from = table.start,
                                              to   = table.end)]),
                     header = TRUE))
}

read.entry.table("test.txt", "Bolivar")
#   region  begin      stop
# 1     2A   2785 242068364
# 2     2A 736640 198339289

read.entry.table("test.txt", "Cotopaxi")
#   region    begin     stop
# 1     4A     2282  9951846
# 2     4A 11672561 11906166

Reading in specific pattern-matching lines from a file

More articles: