We support Mailgun event tracking in our application. We reviewed the proposed event polling algorithm , but were not entirely satisfactory. First, we would prefer not to discard the data that we have already written out, and then repeat from scratch after a pause. It is not very effective and leaves the door open for a long cycle of retries, as it is unclear when the circuit should end. Secondly, the "threshold age", apparently, is the key to the definition of "reliability", but its value is not defined, only a very large "half hour" is proposed.
We understand that after some threshold delay, the events become “trustworthy”, let's call it D_max when the events are guaranteed to be in the event store. If so, we can implement this algorithm differently, so that we do not receive data that, as we know, is not “trustworthy” and use all the data that has been received.
We periodically collected data, and at each iteration we would:
- Request the event API, indicating an increasing time range from
T_1 to T_2 = now() - D_max . For the first iteration T_1 you can set some time in the past, for example, half an hour ago. For subsequent iterations, T_1 set to T_2 from the previous iteration. - Select all pages in turn until the URL of the next page returns.
- Use all the selected events, as all of them are “trustworthy”.
My questions:
- Q1: Are there any problems with this approach?
- Q2: What is the minimum realistic value of
D_max ? Obviously, we can use "half an hour" for this, but we would like to be more flexible in tracking events, so it would be nice to know what is the minimum value that we can set, and still reliably retrieve all the events.
Thanks!
mailgun
a_k
source share