MySQL - how to optimize a query for counting votes

Question

MySQL - how to optimize a query for counting votes

After a few opinions on the best way to achieve the following result:

I would like to store in my products MySQL databases that can be voted by users (each vote is +1). I also want to see how many times in total the user voted.

In my simple view, the following table structure would be ideal:

table: product table: user table: user_product_vote +----+-------------+ +----+-------------+ +----+------------+---------+ | id | product | | id | username | | id | product_id | user_id | +----+-------------+ +----+-------------+ +----+------------+---------+ | 1 | bananas | | 1 | matthew | | 1 | 1 | 2 | | 2 | apples | | 2 | mark | | 2 | 2 | 2 | | .. | .. | | .. | .. | | .. | .. | .. |

That way I can do COUNT user_product_vote tables for each product or user.

For example, when I want to search for bananas and the number of votes to display on a web page, I could run the following query:

 SELECT p.product AS product, COUNT( v.id ) as votes FROM product p LEFT JOIN user_product_vote v ON p.id = v.product_id WHERE p.id =1

If my site has become very successful (we can all dream), and I have had thousands of users who have voted for thousands of products, I am afraid that performing such a COUNT with every page view will be extremely inefficient in terms of server resources.

A simpler approach would be to have a “votes” column in the product table, which increases each time a vote is added.

  table: product +----+-------------+-------+ | id | product | votes | +----+-------------+-------+ | 1 | bananas | 2 | | 2 | apples | 5 | | .. | .. | .. |

While it is more resource-friendly - I am losing data (for example, I can no longer prevent a person from voting twice since there is no record of their voting activities).

My questions:
i) Am I too worried about server resources and should just stick with the three-phase option? (i.e. I need to believe more in the database's ability to handle large queries)
ii) is their more effective way to achieve results without losing information.

+6

mysql query-optimization

So over it Sep 04 '10 at 12:40

source share

5 answers

If my site has become very successful (we can all dream), and I have had thousands of users who have voted for thousands of products, I am afraid that performing such a COUNT with every page view will be extremely inefficient in terms of server resources.

Do not waste time solving imaginary problems. mysql can handle thousands of records in fractions of a second perfectly — databases are needed for this. A clean and simple database and code structure is much more important than the mythical "optimization" that no one needs.

+2

user187291 Sep 04 '10 at 13:14

source share

Why not mix and match both? Just get the final bills in the product and user tables so that you don’t have to count every time and have a vote table so that there is no double publication.

Edit: To explain this a little further, the product and user table will have a column called “votes.” Each time the insertion succeeds in user_product_vote, increment the corresponding user and product records. This would avoid re-voting, and you would also not have to run a complex counting request each time.

Edit: I also assume that you created a unique index for product_id and user_id, in which case any duplication attempt will automatically fail, and you do not have to check the table before inserting. You just have to make sure the insert request is running and you get a valid value for the "id" in the form on insert_id

+1

Sabeen malik Sep 04 '10 at 12:49

source share

You must balance the desire of your site to perform quickly (in which it is best to use the second scheme) and the ability to count votes for certain users and prevent double voting (for which I would choose the first scheme). Since only whole columns are used for the user_product_vote table, I don’t see how performance can be severely affected. The many-to-many relationship is common, as you implemented using user_product_vote . If you want to count votes for specific users and prevent double voting, user_product_vote is the only clean way I can think of to implement it, since any other can lead to sparse entries, duplicate entries and all kinds of bad things.

0

Chris laplante Sep 04 '10 at 12:47

source share

You don’t want to update the product table directly with the aggregate every time someone votes - this blocks the rows of products, which then affect other queries that use the products.

Assuming that not all product requests should include a vote column, you can save a separate product table that retains the current totals, and save the userproductvote table as a means to enforce your user voting regarding product business rules and auditing.

0

Stuartlc Sep 04 '10 at 12:51

source share

RobertPitt · Accepted Answer · 2010-09-04T12:49:10+0000

You can never worry about resources, when you start creating an application, you should always have resources, space, speed, etc. if your website traffic increases dramatically and you never build resources, then you start to get into problems .

As for the voting system, I personally would have kept such votes:

 table: product table: user table: user_product_vote +----+-------------+ +----+-------------+ +----+------------+---------+ | id | product | | id | username | | id | product_id | user_id | +----+-------------+ +----+-------------+ +----+------------+---------+ | 1 | bananas | | 1 | matthew | | 1 | 1 | 2 | | 2 | apples | | 2 | mark | | 2 | 2 | 2 | | .. | .. | | .. | .. | | .. | .. | .. |

Causes:

Firstly, user_product_vote does not contain text, drops, etc., it is purely whole, so it still takes less resources.

Secondly, you have more doors for new objects in your application, such as Total votes in 24 hours, Highest product rating in the last 24 hours, etc.

Take this example, for example:

 table: user_product_vote +----+------------+---------+-----------+------+ | id | product_id | user_id | vote_type | time | +----+------------+---------+-----------+------+ | 1 | 1 | 2 | product |224.. | | 2 | 2 | 2 | page |218.. | | .. | .. | .. | .. | .. |

And a simple request:

 SELECT COUNT(id) as total FROM user_product_vote WHERE vote_type = 'product' AND time BETWEEN(....) ORDER BY time DESC LIMIT 20

Another thing is if the user voted for 1AM and then tried to vote in 2PM again, you can easily check when they last voted and if they are allowed to vote again.

There are so many features that you are missing if you stick with your incremental example.

As for your count() , no matter how you optimize your queries, this will not affect large scale much.

With an extremely large user base, your use of resources will be considered from a different perspective, for example, load balancers, mainly server settings, Apache, trap, etc., there is only so much you can do with your requests.

MySQL - how to optimize a query for counting votes

More articles: