Parse a 20 gigabyte input file in an ArrayList

I need to sort a 20 GB file (which consists of random numbers) in ascending order, but I donโ€™t understand what technique I should use. I tried using ArrayList in my Java program, but there is not enough memory in it. Increasing the heap size didnโ€™t work either, I think 20 GB is too big. Can someone guide me, how should I continue?

+6
source share
2 answers

You must use an external sorting algorithm, do not try to put it into memory.

http://en.wikipedia.org/wiki/External_sorting

If you think this is too complicated, try the following:

  • include H2 database in your project
  • create a new database on the disk (will be created automatically on the first connection)
  • create a simple table in which numbers will be saved.
  • read data by numbers and insert them into the database (do not forget to fix 1000 numbers or so).
  • select numbers with ORDER BY clause :)
  • use JDBC resultSet to get results on the fly and write them to the output file
Database

H2 is simple, works fine with Java, and can be built into your JAR (does not require any installation or configuration).

+9
source

For this you do not need special tools. This is the case of a textbook for external merge sorting, in which you read in parts of a large file at a time (say 100M), sort them and write the sorted results to an external file. Read in another part, sort it, spit it out, until there is nothing to sort. Then you need to read the sorted pieces, the smaller part at a time (say 10 M) and sort them in memory. The hard point is to combine these sorted bits in the right direction. As already mentioned, read the Wikipedia external sorting page. Also, here is an implementation in Java that does this sort of external merge.

+4
source

All Articles