Unexpected io: fread behavior in Erlang

Question

Unexpected io: fread behavior in Erlang

This is Erlang's question.

I came across an unexpected io: fread behavior.

I was wondering if anyone could check if something is wrong with how I use io: fread or if there is an error in io: fread.

I have a text file that contains a "triangle of numbers" as follows:

  59
 73 41
 52 40 09
 26 53 06 34
 10 51 87 86 81
 61 95 66 57 25 68
 90 81 80 38 92 67 67 73
 30 28 51 76 81 18 75 44
 ...

There is one space between each pair of numbers, and each line ends with a pair of new carriage lines.

I am using the following Erlang program to read this file into a list.

  -module (euler67).
 -author ('Cayle Spandon').

 -export ([solve / 0]).

 solve () ->
     {ok, File} = file: open ("triangle.txt", [read]),
     Data = read_file (File),
     ok = file: close (File),
     Data

 read_file (File) ->
     read_file (File, []).

 read_file (File, Data) ->
     case io: fread (File, "", "~ d") of
         {ok, [N]} -> 
             read_file (File, [N | Data]);
         eof ->
             lists: reverse (Data)
     end.

The output of this program:

  ( erlide@cayle-spandons-computer.local ) 30> euler67: solve ().
 [59.73, 41.5, 4.40.9, 26.53.6.3410.51.87.86.8161.95.66.57.25,
  6890.81.80.38.92.67.7330.28.51.76.81 | ...]

Please note that the last number of the fourth line (34) and the first number of the fifth line (10) were combined into a single number 3410.

When I upload a text file using "od", there is nothing special about these lines; they end with cr-nl, like any other line:

  > od -ta triangle.txt
 0000000 5 9 cr nl 7 3 sp 4 1 cr nl 5 2 sp 4 0
 0000020 sp 0 9 cr nl 2 6 sp 5 3 sp 0 6 sp 3 4
 0000040 cr nl 1 0 sp 5 1 sp 8 7 sp 8 6 sp 8 1
 0000060 cr nl 6 1 sp 9 5 sp 6 6 sp 5 7 sp 2 5
 0000100 sp 6 8 cr nl 9 0 sp 8 1 sp 8 0 sp 3 8
 0000 120 sp 9 2 sp 6 7 sp 7 3 cr nl 3 0 sp 2 8
 0000 140 sp 5 1 sp 7 6 sp 8 1 sp 1 8 sp 7 5 sp
 0000 160 4 4 cr nl 8 4 sp 1 4 sp 9 5 sp 8 7 sp

One interesting observation is that some of the numbers for which the problem occurs are on the 16-byte boundary in the text file (but not all, for example 6890).

+6

erlang

Cayle spandon Jan 23 '09 at 15:30

source share

3 answers

Besides the fact that this seems to be a bug in one of the erlang libraries, I think you could (very) easily work around this problem.

Given the fact that your file is line-oriented, I think the best practice is that you also process it one at a time.

Consider the following construction. It works fine on unloaded erlang and because it uses lazy evaluation, it can process files of arbitrary length without having to read them all into memory first. The module contains an example of a function applied to each line - turning a string of textual representations of integers into a list of integers.

 -module(liner). -author("Harro Verkouter"). -export([liner/2, integerize/0, lazyfile/1]). % Applies a function to all lines of the file % before reducing (foldl). liner(File, Fun) -> lists:foldl(fun(X, Acc) -> Acc++Fun(X) end, [], lazyfile(File)). % Reads the lines of a file in a lazy fashion lazyfile(File) -> {ok, Fd} = file:open(File, [read]), lazylines(Fd). % Actually, this one does the lazy read ;) lazylines(Fd) -> case io:get_line(Fd, "") of eof -> file:close(Fd), []; {error, Reason} -> file:close(Fd), exit(Reason); L -> [L|lazylines(Fd)] end. % Take a line of space separated integers (string) and transform % them into a list of integers integerize() -> fun(X) -> lists:map(fun(Y) -> list_to_integer(Y) end, string:tokens(X, " \n")) end. Example usage: Eshell V5.6.5 (abort with ^G) 1> c(liner). {ok,liner} 2> liner:liner("triangle.txt", liner:integerize()). [59,73,41,52,40,9,26,53,6,34,10,51,87,86,81,61,95,66,57,25, 68,90,81,80,38,92,67,73,30|...] And as a bonus, you can easily fold over the lines of any (lineoriented) file w/o running out of memory :) 6> lists:foldl( fun(X, Acc) -> 6> io:format("~.2w: ~s", [Acc,X]), Acc+1 6> end, 6> 1, 6> liner:lazyfile("triangle.txt")). 1: 59 2: 73 41 3: 52 40 09 4: 26 53 06 34 5: 10 51 87 86 81 6: 61 95 66 57 25 68 7: 90 81 80 38 92 67 73 8: 30 28 51 76 81 18 75 44

Cheers, h.

+1

haavee Mar 27 '09 at 10:21

source share

I noticed that there are several instances where the two numbers are combined, and it seems that they are on the line boundaries in each line, starting from the fourth line and further.

I found that if you add a space character at the beginning of each line, starting from the fifth, that is:

 59 73 41 52 40 09 26 53 06 34 10 51 87 86 81 61 95 66 57 25 68 90 81 80 38 92 67 73 30 28 51 76 81 18 75 44 ...

Numbers are processed correctly:

 39> euler67:solve(). [59,73,41,52,40,9,26,53,6,34,10,51,87,86,81,61,95,66,57,25, 68,90,81,80,38,92,67,73,30|...]

It also works if you add a space at the beginning of the first four lines.

This is more of a workaround than a real solution, but it works. I would like to figure out how to set the format string for io: fread in such a way that we would not need to do this.

UPDATE Here is a workaround that will not force you to modify the file. This assumes that all digits are two characters (<100):

 read_file(File, Data) -> case io:fread(File, "", "~d") of {ok, [N] } -> if N > 100 -> First = N div 100, Second = N - (First * 100), read_file(File, [First , Second | Data]); true -> read_file(File, [N | Data]) end; eof -> lists:reverse(Data) end.

Basically, the code catches any of the numbers, which are the concatenation of two lines of a new line and divide them into two parts.

Again, this is kludge, which implies a possible error in io: fread, but that should do it.

UPDATE AGAIN The above will work only for two-digit inputs, but since the example packs all the digits (even those 10) into a two-digit format, which will work for this example.

0

Vector Jan 28 '09 at 15:47

source share

womble · Accepted Answer · 2009-01-29T00:22:10+0000

I am going to go with it, being a mistake in Erlang, and strange. Changing the format string to "~ 2s" produces equally strange results:

["59","73","4","15","2","40","0","92","6","53","0","6","34", "10","5","1","87","8","6","81","61","9","5","66","5","7", "25","6", [...]|...]

Thus, it seems that he considers the newline character to be a regular character for counting purposes, but not when it comes to generating output. Loopy is like hell.

Erlang programming week, and I'm already delving into the source. It could be a new record for me ...

EDIT

A little more research has confirmed that this is a mistake. Calling one of the internal methods used in fread :

 > io_lib_fread:fread([], "12 13\n14 15 16\n17 18 19 20\n", "~d"). {done,{ok,"\f"}," 1314 15 16\n17 18 19 20\n"}

Basically, if there are several values to read, then a new line, the first line of a new line will be eaten in the "not yet read" part of the line. Other testing suggests that if you add a space, that’s fine, and if you cast a line with a new line, it asks for more.

I'm going to figure this out, gosh-darn-it ... (grin) There's not much code to go through, and not many of them deal with the news, so it doesn't have to take too long to narrow it down and fix it.

EDIT ^ 2

HA HA! Got a little hot.

Here's the patch for stdlib that you want (remember to recompile and discard the new beam file on top of the old one):

 --- ../erlang/erlang-12.b.3-dfsg/lib/stdlib/src/io_lib_fread.erl +++ ./io_lib_fread.erl @@ -35,9 +35,9 @@ fread_collect(MoreChars, [], Rest, RestFormat, N, Inputs). fread_collect([$\r|More], Stack, Rest, RestFormat, N, Inputs) -> - fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, More); + fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, [$\r|More]); fread_collect([$\n|More], Stack, Rest, RestFormat, N, Inputs) -> - fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, More); + fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, [$\n|More]); fread_collect([C|More], Stack, Rest, RestFormat, N, Inputs) -> fread_collect(More, [C|Stack], Rest, RestFormat, N, Inputs); fread_collect([], Stack, Rest, RestFormat, N, Inputs) -> @@ -55,8 +55,8 @@ eof -> fread(RestFormat,eof,N,Inputs,eof); _ -> - %% Don't forget to count the newline. - {more,{More,RestFormat,N+1,Inputs}} + %% Don't forget to strip and count the newline. + {more,{tl(More),RestFormat,N+1,Inputs}} end; Other -> %An error has occurred {done,Other,More}

Now, to send my patch to erlang-patches, and get the glory and glory received ...

Unexpected io: fread behavior in Erlang

More articles: