MySQL increment counter scaling (to track page views)

I have an integer MySQL column that grows every time a page is viewed. The SQL query looks something like this:

UPDATE page SET views = views + 1 WHERE id = $id

We began to encounter scaling problems when the same page (the same identifier) ​​was scanned many times per second (writing is blocked in MySQL), and the query will stop MySQL. To combat this, we used the following strategy:

Each time the page loads, we increase the counter in Memcache and put the task in the queue (Gearman), which updates the counter in MySQL in the background (among 3 working machines). The simplified code is as follows:

In page view:

 $memcache->increment("page_view:$id"); $gearman->doBackground('page_view', json_encode(array('id' => $id))); 

In the background, the worker:

 $payload = json_decode($payload); $views = $memcache->get("page_view:{$payload->id}"); if (!empty($views)) { $mysql->query("UPDATE page SET views = views + $views WHERE id = {$payload->id}"); $memcache->delete("page_view:{$payload->id}"); } 

It worked out well. This allows us to reduce database queries (since we collect representations in memcache before writing to the database), and the database is written in the background, not supporting page loading.

Unfortunately, we are starting to notice MySQL again. It seems that very active pages still start almost simultaneously, causing MySQL to block again. Castles slow down records and often kill our workers. This causes the lineup to grow very large, often having 70k + jobs that are “behind”

My question is: what should we do next to scale this?

+4
source share
3 answers

I'm not very good at Gearman, so I could be wrong.

You run the repeater task every time you increment the counter. I suppose it would be better to queue the task only if the result of $memcache->increment is 1. My rationale is that when the next update arrives after the relay task clears page_view:$i , you don’t there will be a long queue of reducer tasks that want to update this new value in the database. This should make your code independent of the update speed, and is limited by how fast the mechanism selects new tasks (which, I hope, will be quite slow). In an ideal world, you can simply ask a mechanic to postpone this task for ~ 1 s. This ensures that you only update this counter at a rate of 1 qps.

Regardless of the gearbox, if you can accept slower READINGS and assuming you are using InnoDB, you can chip this counter.

To do this, simply add a fragment column and make it part of the primary key, e.g.

 CREATE TABLE page ( id INTEGER, shard INTEGER, views INTEGER, PRIMARY KEY (id, shard) ) 

When you update this counter, randomly select a shard between 1 - 10. When you read it, SUM over all the id fragments that you want to read. This will make reading 10x slower, but it will allow you to scale 10x while writing. (Of course, he doesn’t need to be 10, you can choose any amount you want.)

+2
source

Not sure if you use the number of pages and how important it is that they all be recorded. Perhaps you can cache the counts in memory on each server, and then save them only in a certain fixed schedule. Thus, you will control the amount of access to the database available to you.

Provided this, obviously, does not guarantee that the calculations will be saved in case the server goes down for any reason. Thus, if this is for some important audit trail or something where the loss of certain types of pages will be a problem, this will not work.

+1
source

Use the MySQL INSERT DELAYED.... . He will not be blocked and will write when possible.

0
source

All Articles