Creating Training Datasets with Caffe I tried using HDF5 and LMDB. However, the creation of LMDB is very slow, even slower than HDF5. I am trying to write ~ 20,000 images.
Am I doing something terribly wrong? Is there something I don't know about?
This is my code for creating LMDB:
DB_KEY_FORMAT = "{:0>10d}" db = lmdb.open(path, map_size=int(1e12)) curr_idx = 0 commit_size = 1000 for curr_commit_idx in range(0, num_data, commit_size): with in_db_data.begin(write=True) as in_txn: for i in range(curr_commit_idx, min(curr_commit_idx + commit_size, num_data)): d, l = data[i], labels[i] im_dat = caffe.io.array_to_datum(d.astype(float), label=int(l)) key = DB_KEY_FORMAT.format(curr_idx) in_txn.put(key, im_dat.SerializeToString()) curr_idx += 1 db.close()
As you can see, I create a transaction for every 1000 images, because I thought that creating a transaction for each image would create overhead, but it does not seem to affect the performance too much.
source share