Sorry for the long post!
I have a database containing ~ 30 tables (InnoDB engine). Only two of these tables, namely, “transaction” and “shift”, are quite large (the first has 1.5 million rows, and the shift is 23 thousand rows). Now everything is working fine, and I have no problem with the current database size.
However, we will have a similar database (the same data types, design, ..), but much more, for example, the “transaction” table will have about 1 billion records (about 2.3 million transactions per day), and we think about how we should deal with this amount of data in MySQL? (this is both reading and writing). I read a lot of related posts to find out if Mysql (and more specifically, the InnoDB engine) can work well with billions of records, but still I have some questions. Some of these related posts that I read are as follows:
What I have understood so far to improve performance for very large tables:
- (for tables innoDB - this is my case), increasing
innodb_buffer_pool_size (for example, up to 80% of RAM). Also, I found some other MySQL performance tuning settings here in the percona blog - with corresponding indexes in the table (using EXPLAN for queries)
- table splitting
- MySQL sharding or clustering
Here are my questions / confusions:
On separation, I have some doubts about whether we should use it or not. On the one hand, many people have proposed improving performance when the table is very large. On the other hand, I read a lot of posts that say that this does not improve query performance and does not speed up query execution (for example, here and here ). In addition, I read in the MySQL Reference Guide that InnoDB foreign keys and MySQL partitioning are not compatible (we have foreign keys)
As for indexes, now they work well, but as I understand it, for very large tables, indexing is more restrictive (as Kevin Bedell said in his answer here ). In addition, indexes speed up reading when writing is slowed down (insert / update). So, for a new similar project, we will have this large database, we must first insert / load all the data, and then create the indexes? (to speed up the insertion)
If we cannot use partitioning for our large table (transaction table), what is an alternative to improve performance? (except for the parameters of the MySQl variable, such as innodb_buffer_pool_size ). Should we use mysql clusters? (we also have many associations)
EDIT
This is the show create table statement for our largest table named "transaction":
CREATE TABLE `transaction` ( `id` int(11) NOT NULL AUTO_INCREMENT, `terminal_transaction_id` int(11) NOT NULL, `fuel_terminal_id` int(11) NOT NULL, `fuel_terminal_serial` int(11) NOT NULL, `xboard_id` int(11) NOT NULL, `gas_station_id` int(11) NOT NULL, `operator_id` text NOT NULL, `shift_id` int(11) NOT NULL, `xboard_total_counter` int(11) NOT NULL, `fuel_type` int(11) NOT NULL, `start_fuel_time` int(11) NOT NULL, `end_fuel_time` int(11) DEFAULT NULL, `preset_amount` int(11) NOT NULL, `actual_amount` int(11) DEFAULT NULL, `fuel_cost` int(11) DEFAULT NULL, `payment_cost` int(11) DEFAULT NULL, `purchase_type` int(11) NOT NULL, `payment_ref_id` text, `unit_fuel_price` int(11) NOT NULL, `fuel_status_id` int(11) DEFAULT NULL, `fuel_mode_id` int(11) NOT NULL, `payment_result` int(11) NOT NULL, `card_pan` text, `state` int(11) DEFAULT NULL, `totalizer` int(11) NOT NULL DEFAULT '0', `shift_start_time` int(11) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `terminal_transaction_id` (`terminal_transaction_id`,`fuel_terminal_id`,`start_fuel_time`) USING BTREE, KEY `start_fuel_time_idx` (`start_fuel_time`), KEY `fuel_terminal_idx` (`fuel_terminal_id`), KEY `xboard_idx` (`xboard_id`), KEY `gas_station_id` (`gas_station_id`) USING BTREE, KEY `purchase_type` (`purchase_type`) USING BTREE, KEY `shift_start_time` (`shift_start_time`) USING BTREE, KEY `fuel_type` (`fuel_type`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=1665335 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT
Thank you for your time,