Reading the entire text file into the MATLAB variable at once

Question

Reading the entire text file into the MATLAB variable at once

I would like to read the (rather large) log file in the MATLAB row cell in one step. I used the usual:

s={}; fid = fopen('test.txt'); tline = fgetl(fid); while ischar(tline) s=[s;tline]; tline = fgetl(fid); end

but it is just slow. I found that

 fid = fopen('test.txt'); x=fread(fid,'*char');

faster, but I get the matrix nx1 char, x . I could try converting x to a row cell, but then I ended up in char encoding; the line separator seems to be \ n \ r, or 10 and 56 in ASCII (I looked at the end of the first line), but these two characters often do not follow each other and even sometimes solos appear.

Is there an easy quick way to read an ASCII file in a row cell in one step or convert x to a row cell?

Reading via fgetl:

 Code Calls Total Time % Time tline = lower(fgetl(fid)); 903113 14.907 s 61.2%

Reading through fread:

 >> tic;for i=1:length(files), fid = open(files(i).name);x=fread(fid,'*char*1');fclose(fid); end; toc Elapsed time is 0.208614 seconds.

I tested preallocation and this does not help :(

 files=dir('.'); tic for i=1:length(files), if files(i).isdir || isempty(strfind(files(i).name,'.log')), continue; end %# preassign s to some large cell array sizS = 50000; s=cell(sizS,1); lineCt = 1; fid = fopen(files(i).name); tline = fgetl(fid); while ischar(tline) s{lineCt} = tline; lineCt = lineCt + 1; %# grow s if necessary if lineCt > sizS s = [s;cell(sizS,1)]; sizS = sizS + sizS; end tline = fgetl(fid); end %# remove empty entries in s s(lineCt:end) = []; end toc

The elapsed time is 12.741492 seconds.

About 10 times faster than the original:

 s = textscan(fid, '%s', 'Delimiter', '\n', 'whitespace', '', 'bufsize', files(i).bytes);

I needed to set 'whitespace' to '' to save the initial spaces (which I need for parsing) and bufsize for the file size (by default 4000 threw a buffer overflow error).

+4

file file-io matlab ascii cell

stephan hattinger Aug 21 '10 at 11:57

source share

5 answers

I use urlread for this, for example:

 filename = 'test.txt'; urlname = ['file:///' fullfile(pwd,filename)]; try str = urlread(urlname); catch err disp(err.message) end

Then the str variable contains a large block of text of type string (ready to use a regular expression).

+5

Paul hugs Jul 07 '12 at 7:39

source share

Use fgetl instead of fread . For more information, go here.

+1

Hari menon Aug 21 '10 at 12:18

source share

s = regexp ( fileread ('test.txt'), '( \ r \ n | \ n | \ r )', 'split');

An example of shells in the Matlab regex documentation is in place.

+1

Ben Feb 20 '13 at 2:19

source share

The following method is based on what Jonas suggested above, which I really like. Be that as it may, we get an array of s cells. not one line.

I found another line of codes, we can get one string variable, as shown below:

 % original codes, thanks to Jonas fid = fopen('test.txt'); s = textscan(fid,'%s','Delimiter','\n'); s = s{1}; % the additional one line to turn s to a string s = cell2mat(reshape(s, 1, []));

I found it helpful to prepare text for jsondecode (text). :)

0

ddystill Oct 15 '19 at 16:01

source share

Jonas · Accepted Answer · 2010-08-21T12:29:29+0000

The main reason your first example is slow is because s grows at each iteration. This means re-creating the new array, copying the old lines and adding a new one, which adds extra overhead.

To speed things up, you can reassign s

 %# preassign s to some large cell array s=cell(10000,1); sizS = 10000; lineCt = 1; fid = fopen('test.txt'); tline = fgetl(fid); while ischar(tline) s{lineCt} = tline; lineCt = lineCt + 1; %# grow s if necessary if lineCt > sizS s = [s;cell(10000,1)]; sizS = sizS + 10000; end tline = fgetl(fid); end %# remove empty entries in s s(lineCt:end) = [];

Here is a small example of what pre-resolution can do for you

 >> tic,for i=1:100000,c{i}=i;end,toc Elapsed time is 10.513190 seconds. >> d = cell(100000,1); >> tic,for i=1:100000,d{i}=i;end,toc Elapsed time is 0.046177 seconds. >>

EDIT

As an alternative to fgetl you can use TEXTSCAN

 fid = fopen('test.txt'); s = textscan(fid,'%s','Delimiter','\n'); s = s{1};

This reads the lines of test.txt as a string into an array of s cells at a time.

Reading the entire text file into the MATLAB variable at once

More articles: