How to optimize this MySQL table?

This is for the upcoming project. I have two tables - first, the tracks of photos are recorded, and the second tracks the rank of the photo

Photos: +-------+-----------+------------------+ | id | photo | current_rank | +-------+-----------+------------------+ | 1 | apple | 5 | | 2 | orange | 9 | +-------+-----------+------------------+ 

The rank of a photograph is constantly changing, and this is the table that tracks it:

 Ranks: +-------+-----------+----------+-------------+ | id | photo_id | ranks | timestamp | +-------+-----------+----------+-------------+ | 1 | 1 | 8 | * | | 2 | 2 | 2 | * | | 3 | 1 | 3 | * | | 4 | 1 | 7 | * | | 5 | 1 | 5 | * | | 6 | 2 | 9 | * | +-------+-----------+----------+-------------+ * = current timestamp 

Each rank is tracked for reporting / analysis purposes. [Change] Users will have access to statistics upon request.

I talked with someone who has experience in this area, and he told me that maintaining ranks like the above is the way to go. But I'm not sure yet.

The problem is data redundancy . There will be tens of thousands of photos. The photo rank changes on an hourly basis (many times over several minutes) for the latest photos, but less often for old photos. In this scenario, the table will have millions of records over several months. And since I do not have experience with large databases, this makes me a little nervous.

I thought about this:

 Ranks: +-------+-----------+--------------------+ | id | photo_id | ranks | +-------+-----------+--------------------+ | 1 | 1 | 8:*,3:*,7:*,5:* | | 2 | 2 | 2:*,9:* | +-------+-----------+--------------------+ * = current timestamp 

This means that some additional code in PHP shares the rank / time (and sort), but it looks good to me.

Is it right to optimize the table for performance? What would you recommend?

+6
optimization database php mysql
source share
9 answers

First. Period.

In fact, you will lose much more. The timestamp stored in the int column will occupy only 4 bytes of space.

While the same timestamp stored in string format will take 10 bytes.

+7
source share

I would stick with your first approach. In the second you will have a lot of data stored in the row, as time goes on, it gets more ranks! That is, if a photograph receives thousands and thousands of ranks.

The first approach is also more convenient to maintain, i.e. if you want to remove the rank.

+2
source share

Your first design is true for a relational database. Redundancy in key columns is preferable because it gives you much more flexibility in how you check and request ranking. You can do sorts, counts, averages, etc. In SQL, without having to write any PHP code to split your string into six ways from Sunday.

It looks like you would like to use a non-SQL database like CouchDB or MongoDB. They will allow you to store a semi-structured ranking list directly in the record for a photo and then quickly request a rating. With the caveat that you really do not know that the ratings are in the correct format, as you do with SQL.

+2
source share

I would have thought that the database “hit” compared to normalistic (querying the rank table again and again) is well avoided by “caching” the last rank in current_rank. Actually, it doesn't matter that the rows grow enormously if they are rarely asked (the analysts / messages you said) were never updated, but simply included the entries inserted at the end: even in a very bright field there would be no problem with millions of lines in this table.

Alternatively, you will need a lot of updates in different places on the disk, which can lead to poor performance.

Of course, if you need all the old data, and always by photo_id, you can plan the scheduled run in another rankings_old table, possibly with a photo_id, year, month, ranking (including timestamps), when the month is over, so getting old data remains easily possible but there are no updates needed in rankings_old or ratings, only inserts at the end of the table.

And take this from me: millions of records in a clean logging table should be absolutely no problem.

+1
source share

Normalized data or non-normalized data. You will find thousands of articles about this. :)

It really depends on your needs.

If you want to create your database with only performance (speed or RAM consumption or ...), you should only trust numbers. To do this, you need to profile your queries with the expected amount of data (you can generate data using some script that you write). To view your queries, learn how to read the results of the following two queries:

  • EXPLAIN extended...
  • SHOW STATUS

Then find out what to do to improve the numbers (mysql parameters, data structure, hardware, etc.).

As a starter, I really recommend these two great articles:

If you want to build the academic beauty of normalization: just follow books and general recommendations. :)

+1
source share

Of the two options - like everything that came before me - this should be option 1.

What really bothers you is the bottlenecks in the application itself. Do users often refer to historical data, or is it displayed only to a few select users? If the answer is that everyone gets the opportunity to view historical rank data, then option 1 is good enough. If you are not going to often refer to historical series, you can create a third “archive” table, and before updating the ranks, you can copy the rows of the original rank table to the archive table. This ensures that the number of rows remains minimal in the called main table.

Remember that if you update rows and there are 10 thousand, it may be more useful to get the results in your code (PHP / Python / etc), crop the table and paste the results, and not update this row by row, as this will be potentially narrow the place.

You might also want to take a look at the fragments (horizontal splitting) - http://en.wikipedia.org/wiki/Shard_%28database_architecture%29

And never forget to index well.

Hope this helped.

+1
source share

You indicated that the rank is only associated with the image, in which case all you need is table 1 and continue to update the rank in real time. Table 2 simply stores unnecessary data. The disadvantage of this approach is that the user cannot change his voice.

0
source share

You said that the second table is for analysis / statistics, so this is actually not what you need to store in db. My suggestion is to get rid of the second table and use the registration tool to record rank changes.

0
source share

Your second design is very dangerous if you have 1 million votes per photo. Can PHP handle this?

With the first design, you can do all the math at the database level, which will return you a small result .

0
source share

All Articles