Should I store 90 rows of data for EVERY user (with millions of users)?

I plan the MySql database structure and can use some tips from more experienced professionals. The site to which the database belongs collects 90-day weather data for each registered user and must support millions of users.

I already have a table for users, with their username and contact information, but suppose I need a second table for all weather data ...

What I intend to do is basically store the average temperature, humidity, wind direction and so on - per day - for each user. And every day the database is updated with new data of the day, saving records yesterday (but limited to 89-day old data + data of the current day) - for all users.

Now, does it make sense to have one huge “data” table that has 90 rows for EVERY user (with millions of users)? Or is there a smarter way to do this, which is better for performance reasons or similar?

90-day data will be available (READ and displayed, etc.) each time a user logs in and views his profile or views another user's profile. But it will be updated only once a day (overwrite the oldest record, maintaining a limit of 90 lines per user).

+4
source share
5 answers

Edit: it is now visible that each user has different weather data. Saving "general data" in the answer, but you are interested in the second case.

Users share weather data

Based on, say, their nearest weather station identifier.

I would save the table (userId, stationId, isActive, isPreferred) to find out what data the user is interested in, and then I ran a query from stationWeatherData to get 90 rows of meteorological data for this station.

Each user has their own weather data.

There should be no particular problems handling 900 million users. If you really needed to, you could beware in different tables based on userId, for example, the weather174 table will contain data for all users for which (userId% 1000) gives 174, and you would end up with 1000 tables - perhaps on different servers - one thousandth of a size.

So, you start with one big table and get ready for a splinter (or move to the cloud storage and the keystore database without SQL, for example MongoDB, VoltDB). Or a section based on UserID, as soon as the UserID reaches, say, a million.

Or even, you do not use the database at all. The database makes sense if you need to search or map / join data - here you just get access to the user’s “weather station”.

If you know that you will never ask “How many users have 60% humidity?”, But always only “What data does the user have 1234567?”, Then you can save the data in the buffer format in binary format, JSON or HTML (in the cloud) storage, S3 or even MongoDB - now only one document for each user). Much will then depend on how the data to be updated arrives, i.e. In one large batch from a hub or each user downloading their own.

+2
source

For my answer (see below), I assumed that the data is user-specific, for example, from their personal backyard surveillance station. If this is data shared with other users, then my answer is suboptimal.


This seems reasonable, but why stop there for 90 days? Keep daily information for each user as long as they are valid. The query described is always something like

SELECT temperature_avg, humidity, wind_direction, wind_speed FROM weather_summary WHERE user_id = (current_user) ORDER BY sample_date DESC LIMIT 90; 

As long as there are indexes on sample_date and user_id , this will be extremely efficient.

Having a separate table for each user has never worked out very well in my experience.

+1
source

If you save the location of each user, it would be easier to store weather data based on location and map it to the user on demand.

UserId → LocationId → Weather Data.

Assuming that on average there will be several users in each location, this should slightly reduce the size of your database and also scale better.

+1
source

I would recommend one table for weather data, partitioned by date (see MySQL documentation on range partitioning ).

This way, you can easily get rid of old data (just drop the oldest section), and queries for day ranges (say, average temperature over the last 7 days) will be very efficient.

+1
source
  • Create an index in the table columns (identifier, full-text indexing).
  • As an idea, you can create several views in this table that will contain filtered data based on location, days, week, month or quarter or alphabets or other criteria and based on what your code will determine which type to use to get results search.
  • OR if there are many insert / update operations in your table, you can create more than one table and, based on some criteria, select the table name for updating / inserting data in the server-side programming language.
0
source

All Articles