What is the most efficient way to store measurements with a variable number of fields in a database?

We have a data collection system that collects measurements from environmental sensors that measure the speed of water flowing through a river or canal. Each measurement generates a fixed number of values โ€‹โ€‹(for example, date, time, temperature, pressure, etc.), as well as a list of speed values. The sensors initially supplied three speed values, so I just saved each value in my own column of one table in the FireBird database. A sensor was later introduced that could output up to nine speed values, so I just added six more columns. Despite the fact that most sensors use less than 9 values, I thought this would not be a problem if most columns just contain zeros.
But now I am faced with a new generation that can output anything from 1 to 256 values, and I believe that it will not be very efficient to add another 247 columns, especially since most measurements will contain only 3 to 9 values.
Since measurements are collected every 10 minutes, and the database contains all the data for 30 to 50 sensors, the total amount of data is quite significant after a few years, but it should be possible to generate surveys / graphs for any random period of time.

So what would be the most efficient way to store a list of variable values?
Since each record has its own unique identifier, I assume that I can simply save all speed values โ€‹โ€‹in a separate table, and each value is marked with a record identifier. I just have a feeling that it will not be very effective, and that after that it will be very slow.

+4
source share
3 answers

Databases can handle large amounts of data in a table if you use efficient indexes. So you can use this table structure:

create table measurements ( id, seq integer, -- between 1 and 256 ts timestamp, -- Timestamp of the measurement value decimal(...) ) 

Create an index on id , id, seq and ts . This will allow you to efficiently search for data. If you distrust your database, just insert a few million rows and run a few selections to see how good it is.

For comparison: I have an Oracle database here with 112 million rows, and I can select a record by timestamp or identifier within 120 ms (0.12 s)

+4
source

You can save serialized data in a text field, for example JSON-encoded measurements as:

 [<velocity-value-1>, <velocity-value-2>, ...] 

Then in your code, deserialize the values โ€‹โ€‹after the query.

This should work if you are only filtering your queries with other fields than the stored values. If you filter values, using them in WHERE clauses will be a nightmare.

0
source

I would go with a second table:

 table measurements (Id, DateTime, Temperature, Pressure) table velocity (Id, MeasurementId, Sequence, Value) 

Velocity.MeasurementId links Measurements.Id .
Velocity.Sequence is the speed value index for this measurement (1-256).

Fill these tables with data as close to the real world as possible and check your SQL queries to find the best indexes.

0
source

All Articles