What data structure is used to store a large number of rows

Question

What data structure is used to store a large number of rows

Well, to explain the problem-question ...

I have:
One Big DB table is populated with millions of records (each record may contain n columns).

Concept:

I want to show the web interface two lists (for example, "Available" and "Selected"). When a user moves an entry from one list to another, I need to store a temporary unique identifier (row type) of the record in the "Unknown data structure" with the name "selected" on my server and when the user finally clicks the "Send" button I will transfer this list next to another application.

Sorting and filtering are performed in the database, and then the full amount of data (in pieces) is loaded back into java, then each record will be checked if it is selected, and will be added to the list that will be displayed in the web interface.

for each entry{ if(selected.contains(currentEntry.ID)){ selectedList.add(currentEntry) }else{ availableList.add(currentEntry) } }

The lists of selectedList and availableList will contain only a few hundred records (those that are displayed to the user, approximately on the page with a maximum number of 100-200 records), so the list of the "record" type is good enough and contains my sorting.

Problem:
The "selected" structure must contain many thousands of identifiers (sometimes this can reach millions (s)).

Necessity:
I need quick access to find if id (struct.contains (id)) exists, so I will definitely use a hash structure. I need a structure that will use minimal memory resources.

Not necessary:
Good removal performance is not required. No sorting required.

+4

java data-structures

Stef Apr 26 '13 at 12:21

source share

5 answers

mybe, where you have quick access, such as a HashSet.

+1

duffy356 Apr 26 '13 at 12:24

source share

You can use TreeSet , javadoc says that it provides a guaranteed log (n) time value for basic operations (add, delete and contain) ", and if you need to associate something with your identifier, use HashMap

+1

Deadly jesus Apr 26 '13 at 12:25

source share

1. Since you need to hold thousands of identifiers, well, HashMap is one of them. It has very quick access if the key is known and quick insert.

2. As a rule, both treemap and HashMap not synchronized, but hashtable synchronized. Meanwhile, hashtable does not allow null keys or values. HashMap , on the other hand, allows one null key.

3. You can also switch to treemap , because treemap allows us to retrieve elements in some sort of user-defined order. Well, I think treemap slower than HashMap

Edit: Well, after reading a few articles, I also came across this.

Seriously though, you better stay away from the Hashtable as a whole. For single-threaded applications, you do not need additional synchronization overhead. For highly competitive applications, paranoid synchronization can lead to starvation, deadlocks, or unnecessary garbage pause. As Tim Howland noted, you can use ConcurrentHashMap instead

So, I would go with ConcurrentHashMap

0

Vinayak pahalwan Apr 26 '13 at 12:35

source share

HashSet should provide quick access and, for the most part, will have constant access, although I think that if possible, you can run a sample test to check if there is too high a chance of a collision due to millions of input and the nature of your data set.

This, of course, will not meet your optimal memory requirements, what is the size you expect from placing millions of records in Java memory? If its size is very large (say 1000 MB), you may need to consider distributed caches or even consider indexing approaches.

0

harsh Apr 26 '13 at 12:48

source share

Stef · Accepted Answer · 2013-05-03T12:26:35+0000

After much testing, I realized that all Hash structures (HashSet, LinkedHashMap, etc.) work in approximately the same way.

I began to encounter overflow problems in my Testsystems when I switched over 200,000 items (of course, this is related to hardware, etc.).

I might move on to using the database table to store the selected identifiers and retrieving data directly from the database using joins (in any case, I would use db to sort and filter)

Thanks @DariusX. for a "winning" offer and everyone else for their help.

What data structure is used to store a large number of rows

More articles: