Best database schema for latitude, longitude, population?

I am developing a web application that needs to query a very large population density database. The data is mainly latitude, longitude, and population (you can abstract it as lat, lon, pop).

I am going to use MySQL and PHP for this. The data is very granular, so the total number of points is very large - of the order of several billion. (Actually, I don’t know how big this moment is, to be honest, I don’t have complete data yet, but just samples for the game). A server is just a cheap server farm machine (like Bluehost).

The application will try to aggregate population data for circles of arbitrary radius centered at specific latitude and longitude coordinates. Therefore, basically I will say: "Tell me that the total population for a circle of radius x concentrated in lat is long." Most likely, it will just write a very simple summation function.

Given all of the above and the desire to make it as fast and efficient as possible, my question is: what is the best database schema? I looked here and found out some nice basics regarding storing such data (floating functions work fine for lat / lon data, the BETWEEN operator is faster than the <and> operators), but I am wondering if, given the bulk of the data, there is a better way to do this than having a table of several billion rows and three fields (e.g. lat, lon, pop).

Possible ideas that have arisen for me will split all the main longitude headers into separate tables, but I don’t know if this really speeds up the process much. (I don’t know much about MySQL optimization other than indexing.) Another related idea is to store super long strings of hexadecimal data for different “chunks” of data (for example, specific squares of width squares). Another alternative is, in essence, to use large binary bitmap images and simply decode them on the fly (this, it seems to me, is impractical for my relatively cheap server to manage).

But I am not a database administrator and not even a very programmer (I am not a beginner, but I am not a professional), so I would like to hear any other suggestions on how to do this, and whether to start nuts from a given current server computing power.

+4
source share

All Articles