I would like to read the (rather large) log file in the MATLAB row cell in one step. I used the usual:
s={}; fid = fopen('test.txt'); tline = fgetl(fid); while ischar(tline) s=[s;tline]; tline = fgetl(fid); end
but it is just slow. I found that
fid = fopen('test.txt'); x=fread(fid,'*char');
faster, but I get the matrix nx1 char, x . I could try converting x to a row cell, but then I ended up in char encoding; the line separator seems to be \ n \ r, or 10 and 56 in ASCII (I looked at the end of the first line), but these two characters often do not follow each other and even sometimes solos appear.
Is there an easy quick way to read an ASCII file in a row cell in one step or convert x to a row cell?
Reading via fgetl:
Code Calls Total Time % Time tline = lower(fgetl(fid)); 903113 14.907 s 61.2%
Reading through fread:
>> tic;for i=1:length(files), fid = open(files(i).name);x=fread(fid,'*char*1');fclose(fid); end; toc Elapsed time is 0.208614 seconds.
I tested preallocation and this does not help :(
files=dir('.'); tic for i=1:length(files), if files(i).isdir || isempty(strfind(files(i).name,'.log')), continue; end %# preassign s to some large cell array sizS = 50000; s=cell(sizS,1); lineCt = 1; fid = fopen(files(i).name); tline = fgetl(fid); while ischar(tline) s{lineCt} = tline; lineCt = lineCt + 1; %# grow s if necessary if lineCt > sizS s = [s;cell(sizS,1)]; sizS = sizS + sizS; end tline = fgetl(fid); end %# remove empty entries in s s(lineCt:end) = []; end toc
The elapsed time is 12.741492 seconds.
About 10 times faster than the original:
s = textscan(fid, '%s', 'Delimiter', '\n', 'whitespace', '', 'bufsize', files(i).bytes);
I needed to set 'whitespace' to '' to save the initial spaces (which I need for parsing) and bufsize for the file size (by default 4000 threw a buffer overflow error).