Csv file with multiple time series

I imported a csv file with lots of columns and sections of data.

v <- read.csv2("200109.csv", header=TRUE, sep=",", skip="6", na.strings=c("")) 

The file layout looks something like this:

 Dataset1 time, data, ..... 0 0 0 <NA> 0 0 Dataset2 time, data, ..... 00:00 0 0 <NA> 0 0 

(The headers of the different datasets are exactly the same.

Now I can build the first dataset with:

 plot(as.numeric(as.character(v$Calls.served.by.agent[1:30])), type="l") 

I am curious if there is a better way:

  • Get all numbers read as numbers without having to convert.

  • Refer to the various datasets in the file in some meaningful way.

Any clues would be appreciated. Thanks.


Status update:

I have not yet found a good solution in R, but I started writing a script in Lua to split each individual time series into a separate file. Now I leave it open because I'm curious how well R will handle all of these files. I get 8 files a day.

+2
r time-series
source share
1 answer

What I personally would do is make a script in some scripting language, to separate the different data sets before the file is read in R, and maybe do some of the necessary data transformations too.

If you want to do splitting in R, look for readLines and scan - read.csv2 too high-level and designed to read one frame of data. You can write different data sets to different files or, if you are ambitious, weld files of type R, which can be used with read.csv2 , and read from the correct parts of the base large file.

Once you decide to split the datasets into different files, use read.csv2 for these (or depending on which version of read.table best), if they are not tabs, but fixed-width fields, see read.fwf ) If <NA> indicates "not available" in your file, be sure to include it as part of na.strings . If you do not, R thinks that you have non-numeric data in this field, but with the correct na.strings you automatically get the field converted to numbers. It seems that one of your fields may include timestamps such as 00:00 , so you need to use colClasses and specify the class to which the timestamp format can be converted. If the built-in Date class does not work, just define your own timestamp class and the as.timestamp function that performs the conversion.

+3
source share

All Articles