In my place where I work, files with more than a million lines per file were used. Despite the fact that the server memory is more than 10 GB, and for the JVM - 8 GB, sometimes the server is suspended several times and suffocates from other tasks.
I profiled the code and found that although file reading memory often increased in Giga bytes (from 1 GB to 3 GB), then it suddenly returned to normal. It seems that this frequent high and low memory uses my server freezes. Of course, this was due to garbage collection.
Which API should be used to read files to improve performance?
Now I am using BufferedReader(new FileReader(...)) to read these CSV files.
Process: How do I read a file?
- I read files line by line.
- Each row has several columns. based on the types that they analyze accordingly (cost column in double, visit column in int, keyword column in String, etc.).
- I click the appropriate content (visit> 0) in the HashMap and finally clear this map at the end of the task
Update
I do this reading of 30 or 31 files (data for one month) and retains the right to the card. Later, this card is used to get some criminals in different tables. Therefore, reading is necessary and storing this data is also necessary. Although now I have switched the HashMap part to BerkeleyDB, the problem while reading the file was the same or even worse.
source share