In fact, I am worried about data, about how this happens, in different types of vectors. Some columns are initially of type integer or numeric, but are displayed as a character type.
If I read the data frame with read.csv(), it guesses what type of vectors will automatically convert them. I could not find the same with fread()and data.table(). Data attached here
structure(list(V1 = c("1", "2", "3", "4", "5", "6"), ID = c("109",
"110", "111", "112", "113", "114"), SignalIntensity = c(7.58043495940162,
11.2698560261255, 8.60063586764357, 9.54355755391806, 10.1812351379984,
8.11689493952339), SNR = c(1.34218273720186, 9.75097840763912,
1.80485348504829, 3.20137685049428, 4.64599368338536, 1.42263609838542
)), .Names = c("V1", "ID", "SignalIntensity", "SNR"), row.names = c(NA,
6L), class = "data.frame")
when i read the data frame with read.csv ()
str(df)
data.frame': 20469 obs. of 4 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ ID : int 109 110 111 112 113 114 116 117 118 119 ...
$ SignalIntensity: num 6.18 10.17 7.29 8.9 9.59 ...
$ SNR : num 0.845 4.384 1.073 2.319 3.713 ...
The same data frame read by fread () and read.table ()
'data.frame': 20469 obs. of 4 variables:
$ V1 : chr "1" "2" "3" "4" ...
$ ID : chr "109" "110" "111" "112" ...
$ SignalIntensity: num 6.18 10.17 7.29 8.9 9.59 ...
$ SNR : num 0.845 4.384 1.073 2.319 3.713 ...
read.table()
'data.frame': 20470 obs. of 2 variables:
$ V1: int NA 1 2 3 4 5 6 7 8 9 ...
$ V2: chr ",\"ID\",\"SignalIntensity\",\"SNR\"" ",\"109\",6.18230893141024,0.845357691456258" ",\"110\",10.1727771385494,4.38370775906105" ",\"111\",7.29227469267823,1.07257511609212" ...
I would like to know everything that takes care of all this overhead for the lack of source vector data types. Any automatic conversion other than read.csv () ??
Edit: fread(....,verbose=TRUE)
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.000949 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Using line 30 to detect sep (the last non blank line in the first 'autostart') ... sep=','
Found 4 columns
First row with 4 fields occurs on line 1 (either column names or first row of data)
All the fields on line 1 are character fields. Treating as the column names.
Count of eol after first data row: 20470
Subtracted 1 for last eol and any trailing empty lines, leaving 20469 data rows
Type codes ( first 5 rows): 4433
Type codes (+ middle 5 rows): 4433
Type codes (+ last 5 rows): 4433
Type codes: 4433 (after applying colClasses and integer64)
Type codes: 4433 (after applying drop or select (if supplied)
Allocating 4 column slots (4 - 0 dropped)
0.001s ( 2%) Memory map (rerun may be quicker)
0.000s ( 1%) sep and header detection
0.004s ( 12%) Count rows (wc -l)
0.001s ( 2%) Column type detection (first, middle and last 5 rows)
0.000s ( 0%) Allocation of 20469x4 result (xMB) in RAM
0.025s ( 82%) Reading data
0.000s ( 0%) Allocation for type bumps (if any), including gc time if triggered
0.000s ( 0%) Coercing data already read in type bumps (if any)
0.000s ( 0%) Changing na.strings to NA
0.030s Total