How to add data to an existing LMDB?

I have about 1 million images to add to this dataset 10,000 at a time, added to the dataset.

I'm sure map_size is mistakenly referencing this article

used this line to create a set

env = lmdb.open(Path+'mylmdb', map_size=int(1e12) 

use this line every 10,000 samples to write data to a file, where X and Y are placeholders for the data to be placed in LMDB.

 env = create(env, X[:counter,:,:,:],Y,counter) def create(env, X,Y,N): with env.begin(write=True) as txn: # txn is a Transaction object for i in range(N): datum = caffe.proto.caffe_pb2.Datum() datum.channels = X.shape[1] datum.height = X.shape[2] datum.width = X.shape[3] datum.data = X[i].tostring() # or .tostring() if numpy < 1.9 datum.label = int(Y[i]) str_id = '{:08}'.format(i) # The encode is only essential in Python 3 txn.put(str_id.encode('ascii'), datum.SerializeToString()) #pdb.set_trace() return env 

How can I edit this code so that new data is added to this LMDB and not replaced, since this real method replaces it in the same position. I check the length after generation using env.stat ().

+6
source share
1 answer

Le I reveal my comment above.

All entries in LMDB are stored in accordance with unique keys, and your database already contains keys for i = 0, 1, 2, ... You need to find unique keys for each i . The easiest way to do this is to find the largest key in the existing database and continue to add to it.

Assuming existing keys are sequential,

 max_key = env.stat()["entries"] 

Otherwise, a more thorough approach is to repeat all the keys. ( Check it out. )

 max_key = 0 for key, value in env.cursor(): max_key = max(max_key, key) 

Finally, just replace line 7 of your for loop,

 str_id = '{:08}'.format(i) 

by

 str_id = '{:08}'.format(max_key + 1 + i) 

to add to an existing database.

+4
source

All Articles