You may need to go for heuristics. Perhaps you can estimate travel time based on several factors, such as geometric distance and some features about starting and ending points (urban and rural areas, country, ...). You can get several distances, try to match your parameters on a subset of them and see how well you can predict others. My prediction would be, for example, that travel time approaches a linear relationship with distance, since in many cases the distance increases.
I know this is messy, but hey, you are trying to evaluate 12.5mio datapoints (or any amount :)
You can also gradually add knowledge from already received "real" times to your journey, finding close points to those you are looking for:
- get the nearest StartApprox, EndApprox points to the starting and ending position, so that you have the transit time between StartApprox and EndApprox
- calculate the distances StartError, EndError between start and StartApprox, end and EndApprox
- if StartError + EndError> Distance (StartApprox, EndApprox) * 0.10 (or whatever your threshold is) β calculate the distance via the API (and save it), otherwise use a known travel time plus overhead based on StartError + EndError
(if you have 100 addresses in NY and 100 in SF, all values ββwill be more or less the same (i.e. the difference between them is probably lower than the uncertainty associated with these predictions), and this approach does not allow you to issue 10,000 requests, where 1 will do)
Nicolas78
source share