You changed your mind about the problem, all this can be done very simply with one MySQL table, which saves data to disk, and does not keep everything in memory. That amount of data was never intended to effectively manage a stand-alone application.
CREATE TABLE TONS_OF_STRINGS ( unique_string varchar(255) NOT NULL, UNIQUE (unique_string) )
Just loop the values (assuming a comma-separated list here) and try inserting each token. Each failed token is a duplicate.
public static void main(args) { Connection con = DriverManager.getConnection("jdbc:mysql://localhost/database","username","password"); FileReader file = new FileReader("SomeGiantFile.csv"); Scanner scan = new Scanner(file); scan.useDelimiter(","); String token; while ( scan.hasNext() ) { token = scan.next(); try { PreparedStatement ps = con.prepareStatement("Insert into TONS_OF_STRING (UNIQUE_STRING) values (?)"); ps.setString(1, token); ps.executeUpdate(); } catch (SQLException e) { System.out.println("Found duplicate: " + token ); } } con.close(); System.out.println("Well that was easy, I'm all done!"); return 0; }
Remember to clear the table when you are done, but this is a lot of data.
rwyland
source share