The fastest way to import CSV files into MATLAB

I wrote a script file that saves its output to a CSV file for later reference, but the second script to import data requires an inconvenient amount of time to read it back.

The data is in the following format:

Item1,val1,val2,val3 Item2,val4,val5,val6,val7 Item3,val8,val9 

where the headers are in the leftmost column and the data values ​​occupy the rest of the row. One of the main difficulties is that arrays of data values ​​can be of different lengths for each test element. I would save it as a structure, but I need to be able to edit it outside the MATLAB environment, as sometimes I have to delete rows with bad data on a computer on which MATLAB is not installed. So, first of all, my question is: should the data be saved in a different format?

Second part of the question: I tried importdata , csvread , and dlmread , but I'm not sure which is better, or if there is a better solution. Right now I am using my own script with a loop and fgetl , which is terribly slow for large files. Any suggestions?

 function [data,headers]=csvreader(filename); %V1_1 fid=fopen(filename,'r'); data={}; headers={}; count=1; while 1 textline=fgetl(fid); if ~ischar(textline), break, end nextchar=textline(1); idx=1; while nextchar~=',' headers{count}(idx)=textline(1); idx=idx+1; textline(1)=[]; nextchar=textline(1); end textline(1)=[]; data{count}=str2num(textline); count=count+1; end fclose(fid); 

(I know that this is probably horribly written code - I am an engineer, not a programmer, please do not shout at me, however, any suggestions for improvement would be welcome.)

+6
file-io matlab csv data-import
source share
4 answers

This would probably make reading the data easier if you could overlay the NaN file when your first script creates this:

 Item1,1,2,3,NaN Item2,4,5,6,7 Item3,8,9,NaN,NaN 

or you can even just print empty fields:

 Item1,1,2,3, Item2,4,5,6,7 Item3,8,9,, 

Of course, for the correct installation of the gasket, you need to know what the maximum number of values ​​for all elements is in front of you. In any of the above formats, you can use one of the standard file reading functions, for example TEXTSCAN :

 >> fid = fopen('uneven_data.txt','rt'); >> C = textscan(fid,'%s %f %f %f %f','Delimiter',',','CollectOutput',1); >> fclose(fid); >> C{1} ans = 'Item1' 'Item2' 'Item3' >> C{2} ans = 1 2 3 NaN %# TEXTSCAN sets empty fields to NaN anyway 4 5 6 7 8 9 NaN NaN 
+10
source share

Instead of parsing a line of text one character at a time. You can use strtok to break a string like

 stringParts = {}; tline = fgetl(fid); if ~ischar(tline), break, end i=1; while 1 [stringParts{i},r]=strtok(tline,','); tline=r; i=i+1; if isempty(r), break; end end % store the header headers{count} = stringParts{1}; % convert the data into numbers for j=2:length(stringParts) data{count}(j-1) = str2double(stringParts{j}); end count=count+1; 
+3
source share

I had the same problem reading csv data in Matlab, and I was surprised how little support for this, but then I just found an import data tool. I'm in r2015b.

In the top row of the Home tab, click Import Data and select the file you want to read. The application window will appear as follows:

Screenshot of the import data tool

In the "Import Selection" section, you can "generate a function", which gives you quite a lot of settings, including filling in empty cells and the structure of the output data structure. Plus it is written by MathWorks, so it probably uses the fastest way to read csv files. It was almost instant in my file.

+1
source share

Q1) If you know the maximum number of columns, you can fill in the blank entries with NaN. Also, if all the values ​​are numeric, do you really need the column β€œItem #”? If so, you can only use "#", so all data is numeric.

Q2) The fastest way to read a number. data from a file without mex files is csvread. I try to avoid using strings in csv files, but if necessary, I use the csv2cell function:

http://www.mathworks.com/matlabcentral/fileexchange/20135-csv2cell

0
source share

All Articles