If I understand correctly, you cannot just select non-zero rows, because for each row index (or PK value), your data structure will need to be able to report not only the value, but also whether it is there at all. So, assuming 0, if you didn't find it in your data structure, it might not be a good idea.
Just to make sure - exactly how many lines are we talking about here? Millions? A million integers would occupy only 4 MB of RAM as an array. Not really. I think it should be at least 100'000'000 rows.
Basically, I would suggest a sorted array of integer pairs to store non-zero values. The first element in each pair will be the PK value, and this is what the array will sort. The second element will be the value. You can choose a database that, of course, returns only these nonzero values. Since the array will be sorted, you can use binary search to find your values.
If there are no "holes" in the PK values, then the only thing you need, besides this, is the minimum and maximum PK values โโso that you can determine if the given index belongs to your data set.
If unused PK values โโare used between used ones, you will need a different mechanism to determine which PK values โโare valid. Perhaps a bitmask or other array of real (or invalid, whichever is less) PK values.
If you choose the bit mask method, there is another idea. Use two bits for each PK value. The first bit indicates whether the PK value is valid or not. The second bit indicates whether it is equal to zero or not. Store all nonzero values โโin another array. This, however, will have the disadvantage that you will not know which element of the array corresponds to which bitmask. You would have to count all the way from the start to find out. This can be mitigated by some indices. Let's say for every 1000 entries in the value array you store another integer that tells you where this entry is in the bitmask.
Vilx- source share