Is there a way to provide FIFO (first, first of) behavior with job queues in GAE?

Is there a way to provide FIFO (first in first order) behavior with task queues in GAE?

The GAE documentation says that FIFO is one of the factors influencing the order of tasks, but the same documentation says that “system planning can“ skip “new tasks to the head of the queue”, and I confirmed this behavior using test work. Effect: my events are being processed out of order.

The docs say:

https://developers.google.com/appengine/docs/java/taskqueue/overview-push

The order in which tasks are performed depends on several factors:

The position of the task in the queue. App Engine tries to process tasks based on FIFO> (first in first order). In general, tasks are inserted at the end of the queue and executed from the head of the queue.

Delay tasks in the queue. The system tries to perform the lowest delay possible for any given task using specially optimized notifications to the scheduler. Thus, in the case when the queue has a large number of tasks, system planning can “skip” new tasks to the head of the queue .

The value of the etaMillis property of the task. This property indicates the earliest execution time of the task. App Engine always waits until after the specified ETA to handle push jobs.

The value of the countdownMillis property of the task. This property sets the minimum number of seconds to wait before completing a task. The countdown and this are mutually exclusive; if you specify one, do not specify the other.

What should I do? In my case, I will process 1-2 million events a day from cars. These events can be sent at any interval (1 s, 1 minute or 1 hour). The order in which events are processed must be guaranteed. I need a timestamp process that is created on an embedded device inside the car.

What do I have now?

  • The servlet of rest, which is called by the consumer and creates the task (these events are on the payload).

  • After that, the working servlet will receive this task and:

    • Disable event data;

    • Put event in data warehouse

    • Updating a vehicle in a data warehouse.

So, again, is there a way to ensure only FIFO behavior? Or how can I improve this solution to get this?

+7
source share
7 answers

Ok Here is how I did it.

1) Rest servlet that is called from the consumer: If Event sequence doesn't match Vehicle sequence (from datastore) Creates a task on a "wait" queue to call me again else State validation Creates a task on the "regular" queue (Event data is on payload). 2) A worker servlet gets the task from the "regular" queue, and so on... (same pseudo code) 

In this way, I can pause the "normal" queue for serving data without losing events.

Thank you for your responses. My solution is their mixture.

+3
source

You need to approach this with three separate steps:

  • Deploy a Sharding Counter to generate a monotonously increasing ID . As far as I like to use the timestamp from Google Server to indicate the order of orders, it seems that the timestamps between GAE servers may differ from your requirement.

  • Add your tasks to Pull Queue instead of Push Queue . when creating TaskOption , add the ID obtained from step # 1 as tag . After adding the task, save the ID somewhere in your data warehouse.

  • Your working servlet rents Tasks for a specific tag from Pull Queue . Query the data store to get the earliest identifier to retrieve, and use the ID as a rental tag . This way you can model the FIFO behavior for your task queue.

Once processing is complete, delete the ID from your data store and remember to also remove Task from your Pull Queue . In addition, I would recommend you complete your backend tasks.

UPDATE: As Nick Johnson and mjaggard noted, the outline in step # 1 seems to be impractical to generate monotonically increasing identifiers, and then other sources of identifiers will be needed. I seem to recall that you used timestamps created by your vehicles, can this be used instead of a monotonously increasing identifier?

Regardless of how identifiers are generated, the main idea is to use the data warehouse query mechanism to create a FIFO Tasks order and use the tag task to pull a specific task from the TaskQueue .

However, there is a reservation. Due to the consistency policy in high replication datastores, if you selected HRD as your datastore (and you must have M / S deprecated since April 4, 2012), there may be some outdated data returned by the query in step # 2 .

+4
source

I think the simple answer is no, but in part to help improve the situation, I use the pull queue - pulling 1000 tasks at a time and then sorting them. If synchronization is not important, you can sort them and put them in the data warehouse, and then complete the package at a time. You still have to decide what to do with the tasks at the beginning and at the end of the game, because they may be out of order with the tasks of alternation in other games.

+3
source

You can put this work into the data warehouse row using the creation timestamp and then complete work tasks using this timestamp, but if your tasks are created too quickly, you will run into timeout problems.

0
source

I don’t know the answer myself, but it is possible that the tasks set depending on the deferred function can be performed in the sending order. You will probably need an engineer from G. to get the answer. Pulling queues, as suggested, seems like a good alternative, plus this will allow you to consider putting batch input to put () s.

One note about shaded counters: they increase the likelihood of monotonically increasing identifiers, but do not guarantee them.

0
source

The best way to deal with this, distributed way or the "Path to App Engine" method, is probably to change your algorithm and data collection to work with a time stamp, which allows you to arbitrarily organize tasks.

Assuming this is impossible or too complicated, you can change your algorithm as follows:

  • when creating a task, do not put data in the payload, but is stored in the data warehouse in the form with an order at timestamps and is stored as a child of any object that you are trying to update (vehicleule?), Timestamps should come from the client, not from the server, to guarantee the same order.

  • perform a general task that retrieves data for the first timestamp, processes it, and deletes it inside the transaction.

0
source

Following this topic, I do not know if a strict FIFO requirement is required for all received transactions or based on vehicles. Latter has more options and the former.

0
source

All Articles