Is Python DBM very fast?

I thought that Python's own DBM should be faster than NOSQL databases like Tokyo Cabinet, MongoDB, etc. (since Python DBM has less features and options, i.e. a simpler system). I tested a very simple write / read example as

#!/usr/bin/python import time t = time.time() import anydbm count = 0 while (count < 1000): db = anydbm.open("dbm2", "c") db["1"] = "something" db.close() db = anydbm.open("dbm", "r") print "dict['Name']: ", db['1']; print "%.3f" % (time.time()-t) db.close() count = count + 1 

Read / Write: 1.3s Read: 0.3s Write: 1.0s

These values ​​for MongoDb are at least 5 times faster. Is this really Python DBM performance?

+4
source share
1 answer

Python does not have a built-in DBM implementation. He bases his DBM functions on a wide range of third-party libraries such as DBM, such as AnyDBM, Berkeley DBM, and GNU DBM.

Implementing a Python dictionary is really fast for storing keys, but not permanent. If you need high-performance keyword searches, you can find a better dictionary β€” you can manage your tenacity with something like cpickle or shelve. If startup time is important to you (and if you change data, termination) - more important than access speed at run time, then something like DBM would be better.

In your assessment, as part of the main loop, you included both open dbm calls and array lookups. This is a rather unrealistic use case to open DBM to store one value and close and reopen it before viewing it, and you see the typical slow performance that would be when managing persistent data storage in this way (this is pretty inefficient).

Depending on your requirements, if you need quick search queries and don't care too much about startup times, DBM may be the solution, but to compare it, you only need to include writes and reads in a loop! Maybe something like below:

 import anydbm from random import random import time # open DBM outside of the timed loops db = anydbm.open("dbm2", "c") max_records = 100000 # only time read and write operations t = time.time() # create some records for i in range(max_records): db[str(i)] = 'x' # do a some random reads for i in range(max_records): x = db[str(int(random() * max_records))] time_taken = time.time() - t print "Took %0.3f seconds, %0.5f microseconds / record" % (time_taken, (time_taken * 1000000) / max_records) db.close() 
+15
source

All Articles