I have a MySql table, which consists of:
- ~ 25 million rows (CURRENT)
- 3 pointers
- Every day, the crawler adds ~ 3 million rows
- I'm not looking too far right now, but the final db score might be ~ CONST * e9 rows
- Currently 9.5giga
- innodb and it is read from the insert
The data itself consists of text ~ 100 characters + several fields with metadata about it. Indexes are a unique identifier, author name, and record identifier.
So far, everything went smoothly, but now the server has a hard time processing insertions of new data (~ 10 seconds for each insert, which adds ~ 3k rows). I am trying to find ways to overcome this problem. What I think:
- Performing an index on insertion requires effort. It is possible not to do this when pasting, and only after X-pastes add indexes.
- Separation of data into different tables.
- Scan to a small bit and every X minutes / days, moving data to a large db.
- Going to another db. I'm not familiar enough with NoSql, will this help me solve these problems? Is this a big attempt to use it?
Each option has its own subparameters and dilemmas, but I think that I should first focus on having direction. Which way should I take and why? Is there any other road I should think about?
BTW. There is also the possibility of not storing all the data and only those parts that I really show, but this will make it impossible to make any functional changes in the process of passing the data before displaying it.
Noam
source share