Mailgun: an algorithm for polling events

Question

Mailgun: an algorithm for polling events

We support Mailgun event tracking in our application. We reviewed the proposed event polling algorithm , but were not entirely satisfactory. First, we would prefer not to discard the data that we have already written out, and then repeat from scratch after a pause. It is not very effective and leaves the door open for a long cycle of retries, as it is unclear when the circuit should end. Secondly, the "threshold age", apparently, is the key to the definition of "reliability", but its value is not defined, only a very large "half hour" is proposed.

We understand that after some threshold delay, the events become “trustworthy”, let's call it D_max when the events are guaranteed to be in the event store. If so, we can implement this algorithm differently, so that we do not receive data that, as we know, is not “trustworthy” and use all the data that has been received.

We periodically collected data, and at each iteration we would:

Request the event API, indicating an increasing time range from T_1 to T_2 = now() - D_max . For the first iteration T_1 you can set some time in the past, for example, half an hour ago. For subsequent iterations, T_1 set to T_2 from the previous iteration.
Select all pages in turn until the URL of the next page returns.
Use all the selected events, as all of them are “trustworthy”.

My questions:

Q1: Are there any problems with this approach?
Q2: What is the minimum realistic value of D_max ? Obviously, we can use "half an hour" for this, but we would like to be more flexible in tracking events, so it would be nice to know what is the minimum value that we can set, and still reliably retrieve all the events.

Thanks!

+7

mailgun

a_k Dec 6 '16 at 16:35

source share

1 answer

Petrogad · Answer 1 · 2017-01-20T12:19:02+0000

1: I do not see a problem with this solution (in fact, I am doing something very similar). I also save the event id to verify that I am not inserting duplicate entries.

2: I worked on this similar process. Now I am testing D_max in 10 minutes.

In addition, during the testing process, I perform an additional night task, which returns throughout the day to test several things:

Am I missing existing indicators?
Diagnose if there is a problem with the assumptions I made about D_max.

Mailgun: an algorithm for polling events

More articles: