This question is old, and based on the old version of Julia. However, it would be useful to check if this works in the latest version. I recently tested this in the latest version of 0.5 Julia, and it seems the code above works correctly with 5 * 10 ^ 6 lines of 600 characters. The whole operation takes about 5G of peak memory on my laptop.
julia> t=[randstring(600) for i=1:5*10^6]; julia> writecsv("/Users/aviks/tmp/long.csv", t) julia> t=readstring("/Users/aviks/tmp/long.csv"); julia> length(t) 3005000000 julia> @time t = replace(t, r"( 1:1\.0+)|(( 1:1\.0+)|(([1-6]:)|((\|user )|(\|))))", ""); 43.599660 seconds (137 allocations: 3.358 GB, 0.85% gc time)
(PS: note that readall now deprecated in favor of readstring ).
aviks
source share