Database: when to split into separate tables?

Question

Database: when to split into separate tables?

Tell me, do I have two different types of sensors: one controls the analog voltage (for example, on a temperature sensor), and one measures whether it is turned on or off (sensor switch).

I cannot decide if there is one table:

[Sensor] Id : PK UpperLimit : FLOAT UpperLimitAlertDelay : INT LowerLimit : FLOAT LowerLimitAlertDelay : INT IsAnalog : BOOL [SensorReading] Id : PK SensorId : FK AnalogValue : FLOAT IsOn : BOOL

OR split everything into separate tables:

 [AnalogSensor] Id : PK UpperLimit : FLOAT UpperLimitAlertDelay : INT LowerLimit : FLOAT LowerLimitAlertDelay : INT [AnalogSensorReadings] Id : PK AnalogSensorId : FK Value : FLOAT [SwitchSensor] Id : PK OnTooLongAlertDelay : INT [SwitchSensorReadings] Id : PK SwitchSensorId : FK IsOn : BOOL

At the moment I have it as one table, and I use "UpperLimitAlertDelay" as "OnTooLongAlertDelay" when I do not use it as an analog sensor.

In the code, I differentiate by the logical sign in the Sensor table and create the corresponding object (for example, AnalogSensor or SwitchSensor), but I wonder if it will be more accurate / more correct at the database level to separate it.

What rule of thumb would you use for such a decision? These are different objects on one level, but on another level you can say that they are both just sensors.

This often happens when I can never decide which direction to take when creating a database. Maybe whenever I use bool to determine which fields mean / should be used, should it really be a separate table?

General thoughts on this topic or this important issue are appreciated.

Thanks!

EDIT: additional information.

The switch sensors control things like opening a door, operating a refrigerator compressor, turning on an appliance, etc.

Graphs and reports can be generated on any sensor, so they are used the same way; it's just that the data will either turn on / off, or an analog value depending on the type.

Thus, basically they are usually handled the same way.

There is always one row in the reading table for ONE reading of ONE sensor.

So far, opinions seem quiet subjective - I think that in both cases there are only pros and cons.

Does the information received mean any opinion?

Thanks! Mark.

+4

design database database-design

Mark Oct 05 '10 at 13:09

source share

7 answers

Tables are usually divided into logically different “things”, so you don’t have the same “things” twice. For example, you do not want to:

 [SensorReadings] Id : PK UpperLimit : FLOAT UpperLimitAlertDelay : INT LowerLimit : FLOAT LowerLimitAlertDelay : INT IsAnalog : BOOL AnalogValue : FLOAT IsOn : BOOL

Because you mix the sensor and the readings on the same line. The sensor is different from than his testimony:

 [Sensors] [SensorReadings] Id Id UpperLimit SensorID UpperLimitAlertDelay Reading LowerLimit LowerLimitAlertDelay IsAnalog Manufacturer SerialNumber LastInspectionDate ...

One thing that I wouldn’t split the “sensors” into two tables. A sensor is a sensor, that is what it is. As a customer, a customer, or a song is a song. You will have a table of songs, not a table of classical songs and another table for everyone else. If you divide the sensors into two tables, you can assume that there are two sensors with the same ID . Sensors are unique objects, they must be in the same table, and they all have a unique identifier. The fact that the sensor is analog or digital is a property of the sensor.

Your question is unique - your sensors can have Readings in different logical formats; some are analog floating point values, others are digital logic values. You are struggling with how to store sensor “readings” when not all sensor readings correspond to the same data type of a logical column (i.e. Float vs bool). It comes down to practicality, and what's best for the system.

You can save all readings in a floating point number column:

 [SensorReadings] Id SensorID Reading == ======== ======= 1 3728 120.2 2 3728 120.3 3 89 1 4 89 0 5 3728 120.2 6 89 0

But now you need to know how to interpret the floating point value 0 , 1 as a logical on , off . Is it hard to do? I personally don’t think so. True, it does not fully use the data types available in the database engine, but I do not care. You will join SensorReadings with Sensors , so you will have an IsAnalog column to help you interpret. In other words:

 SELECT Id, SensorID, Reading, Sensors.IsAnalog FROM SensorReadings sr INNER JOIN Sensors s ON sr.SensorID = s.SensorID

Give you pretty easy to parse the results:

 Id SensorID Reading IsAnalog == ======== ======= ======== 1 3728 120.2 false 2 3728 120.3 false 3 89 1 true 4 89 0 true 5 3728 120.2 false 6 89 0 true

You can even create an auxiliary view (or just a query) that decodes the readings as AnalogReading and DigitalReading :

 CREATE VIEW SimpleSensorReadings AS SELECT Id, SensorID, Reading AS RawReading, CASE Sensors.IsAnalog WHEN 0 THEN Reading ELSE NULL END AS AnalogReading, CASE Sensors.IsAnalog WHEN 1 THEN CAST(Reading AS BOOL) ELSE NULL END AS DigitalReading, Sensors.IsAnalog FROM SensorReadings sr INNER JOIN Sensors s ON sr.SensorID = s.SensorID

This will give you:

 [SimpleSensorReadings] Id SensorID RawReading AnalogReading DigitalReading IsAnalog == ======== ========== ============= ============== ======== 1 3728 120.2 120.2 true 2 3728 120.3 120.3 true 3 89 1 true false 4 89 0 false false 5 3728 120.2 120.2 true 6 89 0 false false

It depends on who should deal with the results. I can easily imagine that the code first checks the IsAnalog column and then reads either AnalogReading or DigitalReading .

You could do what you originally proposed; split them into several tables. But now the problem is: how do you access data? It seems to me that if I had this system of sensor readings, at some point I would need to do something with them - to show them to the user. Now I have to jump through the hoops to reunite with the data:

 SELECT ID, AnalogSensorID AS SensorID, Value AS RawReading, Value AS AnalogReading, true AS IsAnalog FROM AnalogSensorReadings UNION ALL SELECT ID, SwitchSensorID AS SensorID, CAST(IsOn AS float) AS RawReading, null AS AnalogReading, IsOn AS DigitalReading, false AS IsAnalog

gives you

 Id SensorID RawReading AnalogReading DigitalReading IsAnalog == ======== ========== ============= ============== ======== 1 3728 120.2 120.2 true 2 3728 120.3 120.3 true 1 89 1 true false 2 89 0 false false 3 3728 120.2 120.2 true 3 89 0 false false

In addition, “Id” is also difficult to decode, because two different readings may have the same “ID”. Reading is reading and must be unique.

The tradeoff you're probably looking for is what you originally had.

 [SensorReadings] Id SensorID AnalogReading DigitalReading == ======== ============= ============== 1 3728 120.2 2 3728 120.3 3 89 true 4 89 false 5 3728 120.2 6 89 false

Yes, this leaves you with a lot of (null) values, but the cost of joining the tables back together is a practical problem that your design decision should consider.

I think of it as a registry in Windows. A key contains a value . You do not care how this value is stored, as long as you can read it, because there is a logical type. To accomplish this in a database, I would use several columns of the data type and read them as needed.

+2

Ian boyd Oct 05 '10 at 14:16

source share

Typically, you want as little redundancy in your database design as possible. Go look for "Normal Forms," something below the BCNF is generally difficult to maintain. Some applications use redundancy to improve read performance, but sacrifice clarity and write performance, such as data warehousing. Connections can be slow, but they are better than inconsistent data when the same information is stored twice.

Therefore, I would recommend using the bottom one. Assume that your sensors are no longer associated with ideal timestamps: unexpectedly, the first sentence of the layout does not work well.

+1

Kajetan abt Oct 05 '10 at 13:13

source share

Question: From the point of view of your system, are they the same thing? If so, they belong to the same table. If not, they belong to two tables.

This is usually no problem. "Employee" and "Insurance Plan" are two different things. “Employee named Bob” and “Employee named Sally” are two examples of the same thing.

Sometimes it’s more difficult. Are “Truck” and “Boat” two different things, or are they just subtypes of “Vehicle”? It depends on the point of view of your system. If you sell them, they are probably the same thing. You probably do not care that someone is swimming and the other is not, you just need how much they cost and how much you have in stock and the like. That is, you save the same data about them and use them in the same queries. But if your system manages the fishing fleet, and for the “Boat” you care about things like crew members and how much they pay and how much fish they caught today, and for Truck you take care of things like when they appear to the dock to pick up the trick today and how much you have to pay the carrier company for the pound, they are probably two different things.

There is no doubt that they are one and the same:

They have the same data (not the same values, but the same fields)
Queries will usually apply against both with little or no difference.

If not, they are probably not the same thing.

That is, if you find that for type 1 field A will have a value, and field B will always be null, and for type 2 field A will be zero and field B will have value, then they are test data.

If you find that if you put them in one table, you usually have to add type checking to all queries, so you only get the right one, then they won’t pass the data check. If you conclude that if you put them in two separate tables, you will have to constantly write queries that make a join or join in two tables to pick up both, then they pass the data test.

In your example, you did not tell us what you are going to do with the data, so I can not discuss the query test. Apparently, you have different data for two types of sensors - the number versus on / off - so right away that you vote for two different things. But then we will return to how it matters to your system. If for temperature probes you will produce temperature graphs over time or to monitor whether they are in certain ranges, and for on / off switches you will start the process when they continue to work and stop when they leave, they are probably there will be two different things. If in both cases you will receive reports on the value - number or on / off - at any time, then they can be the same.

I tend to think that they are probably different, but without knowing more, I can’t say.

+1

Jay Oct 05 '10 at 17:46

source share

I propose to act in accordance with the rules of normalization .

Depending on your needs, you can choose a no-sql database.

0

elCapitano Oct 05 '10 at 13:15

source share

This is a fairly standard design decision, which should be performed when creating an object-relational mapping.

The first option you present is known as table-per-hierarchy, and the second is table-per-concrete-class. There are other options, such as mapping abstract classes to their own tables. Some OR frameworks (e.g. hibernate) provide ready-made code for implementing some of these patterns.

Using some of these keywords in some search results should give you more information. There are many tradeoffs to consider. I guess one of the first things to think about is how many different types of sensors you are likely to have.

Another thing to keep in mind is reporting. If you are going to write a lot of reports that request all types of sensors, then joins will be required for the table per class, which may be undesirable.

0

Paul m Oct 05 '10 at 13:38

source share

I suspect that this will depend on relationships with other objects that are not shown. If there are many objects that are associated with one type of sensor, but not with another, then it may make sense to separate them - otherwise I would be inclined to use a simpler design (i.e., approach to two tables than approach to four tables )

A few changes I would suggest:

Separate "UpperLimitAlertDelay" and "OnTooLongAlertDelay" into separate fields - as I understand it, they are different values, and therefore (under 1NF) there will be separate fields.
Add the datetimestamp field to the reading table.

0

Mark bannister Oct 6 '10 at 10:40

source share

Performancedba · Accepted Answer · 2010-12-14T06:43:51+0000

Is this the same application / database as your other question ?

In this case, the answer was provided in the Data Model .

If this is not the same / db application, or if an adequate answer has not been given to this question, send or publish the PLS. For instance. Based on the previous information, I modeled it so that the SensorType table distinguishes Sensor (analog or logical) ... but we could:

differentiate it at the sensor level,
or type Reading into the subtypes: ReadingAnalog and ReadingSwitch . This can make it a little easier for programs that create graphs, etc.

Database: when to split into separate tables?

More articles: