Reading the entire text file into the MATLAB variable at once

I would like to read the (rather large) log file in the MATLAB row cell in one step. I used the usual:

s={}; fid = fopen('test.txt'); tline = fgetl(fid); while ischar(tline) s=[s;tline]; tline = fgetl(fid); end 

but it is just slow. I found that

 fid = fopen('test.txt'); x=fread(fid,'*char'); 

faster, but I get the matrix nx1 char, x . I could try converting x to a row cell, but then I ended up in char encoding; the line separator seems to be \ n \ r, or 10 and 56 in ASCII (I looked at the end of the first line), but these two characters often do not follow each other and even sometimes solos appear.

Is there an easy quick way to read an ASCII file in a row cell in one step or convert x to a row cell?

Reading via fgetl:

 Code Calls Total Time % Time tline = lower(fgetl(fid)); 903113 14.907 s 61.2% 

Reading through fread:

 >> tic;for i=1:length(files), fid = open(files(i).name);x=fread(fid,'*char*1');fclose(fid); end; toc Elapsed time is 0.208614 seconds. 

I tested preallocation and this does not help :(

 files=dir('.'); tic for i=1:length(files), if files(i).isdir || isempty(strfind(files(i).name,'.log')), continue; end %# preassign s to some large cell array sizS = 50000; s=cell(sizS,1); lineCt = 1; fid = fopen(files(i).name); tline = fgetl(fid); while ischar(tline) s{lineCt} = tline; lineCt = lineCt + 1; %# grow s if necessary if lineCt > sizS s = [s;cell(sizS,1)]; sizS = sizS + sizS; end tline = fgetl(fid); end %# remove empty entries in s s(lineCt:end) = []; end toc 

The elapsed time is 12.741492 seconds.

About 10 times faster than the original:

 s = textscan(fid, '%s', 'Delimiter', '\n', 'whitespace', '', 'bufsize', files(i).bytes); 

I needed to set 'whitespace' to '' to save the initial spaces (which I need for parsing) and bufsize for the file size (by default 4000 threw a buffer overflow error).

+4
source share
5 answers

The main reason your first example is slow is because s grows at each iteration. This means re-creating the new array, copying the old lines and adding a new one, which adds extra overhead.

To speed things up, you can reassign s

 %# preassign s to some large cell array s=cell(10000,1); sizS = 10000; lineCt = 1; fid = fopen('test.txt'); tline = fgetl(fid); while ischar(tline) s{lineCt} = tline; lineCt = lineCt + 1; %# grow s if necessary if lineCt > sizS s = [s;cell(10000,1)]; sizS = sizS + 10000; end tline = fgetl(fid); end %# remove empty entries in s s(lineCt:end) = []; 

Here is a small example of what pre-resolution can do for you

 >> tic,for i=1:100000,c{i}=i;end,toc Elapsed time is 10.513190 seconds. >> d = cell(100000,1); >> tic,for i=1:100000,d{i}=i;end,toc Elapsed time is 0.046177 seconds. >> 

EDIT

As an alternative to fgetl you can use TEXTSCAN

 fid = fopen('test.txt'); s = textscan(fid,'%s','Delimiter','\n'); s = s{1}; 

This reads the lines of test.txt as a string into an array of s cells at a time.

+6
source

I use urlread for this, for example:

 filename = 'test.txt'; urlname = ['file:///' fullfile(pwd,filename)]; try str = urlread(urlname); catch err disp(err.message) end 

Then the str variable contains a large block of text of type string (ready to use a regular expression).

+5
source

Use fgetl instead of fread . For more information, go here.

+1
source

s = regexp ( fileread ('test.txt'), '( \ r \ n | \ n | \ r )', 'split');

An example of shells in the Matlab regex documentation is in place.

+1
source

The following method is based on what Jonas suggested above, which I really like. Be that as it may, we get an array of s cells. not one line.

I found another line of codes, we can get one string variable, as shown below:

 % original codes, thanks to Jonas fid = fopen('test.txt'); s = textscan(fid,'%s','Delimiter','\n'); s = s{1}; % the additional one line to turn s to a string s = cell2mat(reshape(s, 1, [])); 

I found it helpful to prepare text for jsondecode (text). :)

0
source

All Articles