Read large file in OCaml string lines

I am basically trying to read a large file (about 10G) into a list of lines. The file contains a sequence of integers, something like this:

0x123456 0x123123 0x123123 ..... 

I used the method below to read the default files for my code base, but in this case it turns out to be slow (~ 12 minutes)

 let lines_from_file (filename : string) : string list = let lines = ref [] in let chan = open_in filename in try while true; do lines := input_line chan :: !lines done; [] with End_of_file -> close_in chan; List.rev !lines;; 

I think I need to read the file in memory and then split them into lines (I use a 128G server, so this should be good for the memory space). But I still do not understand if OCaml provides such a tool after searching for documents here .

So here is my question:

  • Given my situation, how to quickly view files in a list of strings?

  • How about using stream ? But I need to configure the appropriate application code, and this may cause some time.

+6
source share
3 answers

First of all, you should think about whether you really need all the information right away in your memory. Maybe it's better to process the file one at a time?

If you really want to have all this at once in memory, you can use the Bigarray map_file function to map the file as an array of characters. And then do something with it.

In addition, as I see it, this file contains numbers. It might be better to allocate an array (or even better bigarray) and process each line in order and store integers in a (large) array.

+7
source

I often use the following two functions to read lines of a file. Note that the lines_from_files function is tail recursive.

 let read_line i = try Some (input_line i) with End_of_file -> None let lines_from_files filename = let rec lines_from_files_aux i acc = match (read_line i) with | None -> List.rev acc | Some s -> lines_from_files_aux i (s :: acc) in lines_from_files_aux (open_in filename) [] let () = lines_from_files "foo" |> List.iter (Printf.printf "lines = %s\n") 
+2
source

This should work:

 let rec ints_from_file fdesc = try let l = input_line fdesc in let l' = int_of_string l in l' :: ints_from_file fdesc with | _ -> [] 

This solution converts the strings to integers as they are read (which should be a little more efficient in terms of memory, and I assume that this will be done for them in the end.

In addition, since it is recursive, the file must be opened outside the function call.

0
source

All Articles