I am new to databases, but I have a problem that I cannot understand. Sorry in advance if this is too long, I am trying to summarize all my efforts so that you know exactly what I have done so far. I have an application in which there is logic, and then three database queries. The first query checks if the value exists, the second checks if another (related) value exists, and the third - if it does not exist, adds the corresponding value. Think that I am making a request for number 2, and if it exists, I check for 3 and add it if necessary. I do this loop many times (I look at general queries, but I suspect this program is more readable than write). Previously, I used only a hash table in my program, but when I added several processes, I had problems with synchronization, so I decided to use a database to work with several kernels at the same time.
At first I tried mysql and used a memory storage mechanism (it could all fit into memory), made a composite primary key for the dictionary replication that I had in my program, indexed it, disabled the lock, but I could only get 11,000 requests per second .
Then I tried redis (I heard it looked like memcache) and created the same dict key / value as before (here is the actual mode Can I make two columns unique to each other? Or use a composite primary key in redis? ) And deleted all the fsync files, so he hopefully never hits the hard drive, but I still get about 30,000 requests / second. I looked at system improvements (I use linux), running the program in ramdrive, etc., but still a similar result.
I have a setup script and I tried to do this on ec2 using a high cpu instance, but the result is similar (the requests for both solutions are not very dependent). Iβm kind of in my mind, but I donβt want to give up, because I read that people on stackoverflow talk about how they got 100,000 sq. + Inquiries on an autonomous basis. I believe that my datamodel is very simple (two INT columns, or I can make it one row together with INT together, but this did not seem to slow down), and as soon as the data is created (and requested by another process), I have no the need for perseverance (which is why I try not to write to the hard drive). What tuning am I missing, which allows developers to get this kind of performance? Is special customization required outside of table creation? or is it the only way to get this performance through distributed databases? I know that the problem is with the database, because when I close the middle part of the database, my python application reaches 100% on each core (although it does not write anything), it makes me think that the waiting process (for reading, I suspect) this is what slows it down (I have a lot of free disk / memory space, so I wonder why this is not so, I have 50% processor and 80% free memory during these tasks, so I have no idea what's happening).
I have mysql, redis and hbase. I hope that I can do something so that one of these solutions works as fast as I would like, but if not, I am fine with any solution (this is really just a temp hash table that distributed processes can handle).
What can I do?
Thanks!
Update: as requested in the comments, here is some code (after specific application logic, which seems to be going well):
cursor.execute(""" SELECT value1 FROM data_table WHERE key1='%s' AND value1='%s' """ % (s - c * x, i)) if cursor.rowcount == 1: cursor.execute(""" SELECT value1 FROM data_table WHERE key1='%s' AND value1='%s' """ % (s, i+1)) if cursor.rowcount == 0: cursor.execute (""" INSERT INTO data_table (key1, value1) VALUES ('%s', '%s')""" % (s, i+1)) conn.commit()
above is the code with 3 queries on mysql. I also tried to do one big search (but it was actually slower):
cursor.execute (""" INSERT INTO data_table (key1, value1) SELECT '%s', '%s' FROM dual WHERE ( SELECT COUNT(*) FROM data_table WHERE key1='%s' AND value1='%s' ) = 1 AND NOT EXISTS ( SELECT * FROM data_table WHERE key1='%s' AND value1='%s' ) """ % ((s), (i+1), (s - c * x), (i), (s), (i+1)))
Here is the table design in mysql:
cursor.execute ("DROP TABLE IF EXISTS data_table") cursor.execute (""" CREATE TABLE data_table( key1 INT SIGNED NOT NULL, value1 INT SIGNED NOT NULL, PRIMARY KEY (key1,value1) ) ENGINE=MEMORY """) cursor.execute("CREATE INDEX ValueIndex ON data_table (key1, value1)")
in Redis, its simlair for 3 query structures (since it was the fastest I could get on mysql, except that I don't need to look if this value exists, I just overwrite it to save the query):
if r_server.sismember(s - c * x, i): r_server.sadd(s, i + 1)
My data structure for redis is in a related question (basically its list, 3 => 1 2 3 instead of mysql, having 3 rows for repersent 3 = 1, 3 = 2, 3 = 3.
Hope this helps, any other questions please let me know.