Replace () does not execute for large strings

I have the following code:

cd(joinpath(homedir(),"Desktop")) using HDF5 using JLD # read contents of a file t = readall("sourceFile") # remove unnecessary characters t = replace(t, r"( 1:1\.0+)|(( 1:1\.0+)|(([1-6]:)|((\|user )|(\|))))", "") # convert string into Float64 array (approximately ~140 columns) data = readdlm(IOBuffer(t), ' ', char(10)) # save array on the hard drive save("data.jld", "data", data) 

Which works great when I test it with a source file that has 10 ^ 4 or fewer lines. However, when sourceFile has about 5 * 10 ^ 6 lines, it fails when t = replace(t, r"( 1:1\.0+)|(( 1:1\.0+)|(([1-6]:)|((\|user )|(\|))))", "") with the following message

errormsg

+7
julia-lang
source share
1 answer

This question is old, and based on the old version of Julia. However, it would be useful to check if this works in the latest version. I recently tested this in the latest version of 0.5 Julia, and it seems the code above works correctly with 5 * 10 ^ 6 lines of 600 characters. The whole operation takes about 5G of peak memory on my laptop.

 julia> t=[randstring(600) for i=1:5*10^6]; julia> writecsv("/Users/aviks/tmp/long.csv", t) julia> t=readstring("/Users/aviks/tmp/long.csv"); julia> length(t) 3005000000 julia> @time t = replace(t, r"( 1:1\.0+)|(( 1:1\.0+)|(([1-6]:)|((\|user )|(\|))))", ""); 43.599660 seconds (137 allocations: 3.358 GB, 0.85% gc time) 

(PS: note that readall now deprecated in favor of readstring ).

+1
source share

All Articles