Map List: Effective Implementations

I have code that creates and uses a collection, for example:

List<Map<String, Object>> tableData;

This list of cards is filled with n cards, each of which represents one row in the database. Each line is represented as a map between the field name and the object corresponding to the field (the type in this case does not matter). Some of the fields may be missing. The number of fields, m is always much less than the number of rows (n β‰ˆ 10000 and times; m). I need to reuse the same collection several times to read all the lines, so I can't just use some lazy iterator.

Is there an effective data structure for storing it? Guava provides the collection Table, but does not seem to comply with the bill. I am thinking of creating an interface, for example:

interface TableData{
  int size();
  Map<String, Object> get(int i);
  // ... (interators, etc.)
}

And then create an implementation that uses one Map<String,List<Object>>so that I only create m-lists instead of n-cards and only create maps on the fly when it was needed, but I was wondering if there is a general data structure.

thanks

+4
source share
4 answers

I conducted several tests (not convincing, but very indicative) to establish the memory size of various implementations List<Map<String, Object>>. The base is Java ArrayList<>with elements that are instances of Guava ImmutableMap.

The implementations I compared are as follows:

  • Based implementation Map<String,List<Object>>using HashMapand ArrayLists;
  • List<Object[]> ArrayList;
  • Guava HashBasedTable<Integer,String,Object>;
  • Guava ArrayTable<Integer,String,Object>;

n , m " " k, , . l, Apache Commons RandomStringUtils.

. n = 200000, m = 50, l = 10 k (1,0, 7,5, 0,5), :

    | k = 1.0  | k = 0.75 | k = 0.5  |
----------------------------------------
1.  |     71 % |     71 % |     71 % |
2.  |     71 % |     72 % |     73 % |
3.  |    111 % |    107 % |    109 % |
4.  |     71 % |     73 % |     76 % |

n 20000 .

. , , 70% . -, , , Guava ArrayTable , , . , 1.

+2

, .

, 50% , List<Object[]> :

class TableDataImpl implements TableData {
    private List<Object[]> data;
    private Map<String, Integer> columnNameToIndexMap;

    public Map<String, Object> get(int i) {
        return new ArrayMap(data.get(i));
    }

    private class ArrayMap implements Map<String, Object> {

        private Object[] row;

        ArrayMap(Object[] row) {
            this.row = row;
        }

        public Object get(String key) {
            Integer index = columnNameToIndexMap.get(key);
            if (index==null) return null;
            return row[index];
       }

       // all the other Map stuff... a lot of code!
    }
}

, , .

, , 95% , : BitSet (long[]) , . , , (32 64 ) Object[].

, , .

, , columnNameToIndexMap .

+3

, , , ( ). , , , .

, . , :

public class TableRowPool {

    private static final int INITIAL_CAPACITY = 10000;

    private Queue<Map<String, Object>> mapObjects;

    public TableRowPool() {
        mapObjects = new LinkedList<Map<String, Object>>();
        for(int i = 0; i < INITIAL_CAPACITY; i++) {
            mapObjects.add(new HashMap<String, Object>());
        }
    }

    public Map<String, Object> getTableRowObject() {
        if(mapObjects.size() == 0) {
            mapObjects.add(new HashMap<String, Object>());
        }
        return mapObjects.remove();
    }

    public void returnTableRowObject(Map<String, Object> obj) {
        mapObjects.add(obj);
    }

}

LinkedList , . , , . , , .

, - :

//Load data
while((row = getResultSetRow()) != null) {
    Map<String, Object> rowObj = tableRowPool.getTableRowObject();
    //Fill in data
    myRows.add(rowObj);
}

//... Do all your business logic ...

//Cleanup
for(Map<String, Object> rowObj : myRows) {
    tableRowPool.returnTableRowObject(rowObj);
}
myRows = null;
0

, , OOM, , , , SIMD parallelism - Map-Reduce. , . , , , .

But if you still want to stick with your current approach, why can't you normalize the data so that you can represent fields that are missing: "Null". So, when you read the data and create a map, why not just add β€œnull” for the missing fields? That way you should at least not have key data values ​​like hashmap and you can just liseList<List<Object>>

0
source

All Articles