Diamond pattern: how is it (de) normalized?

Suppose we have the following entities:

  • Production studio
  • Journalist
  • Camera operator
  • Movie shots

In this simple world, a production studio has many journalists and many cameramen. Each journalist belongs to one studio. Same thing with operators. News frames are produced by one journalist and one cameraman, where both are from the same studio.

Here is my naive approach to put this model in a relational database:

CREATE TABLE production_studios( id SERIAL PRIMARY KEY, title TEXT NOT NULL ); CREATE TABLE journalists( id SERIAL PRIMARY KEY, name TEXT NOT NULL, prodution_studio_id INTEGER NOT NULL REFERENCES production_studios ); CREATE TABLE camera_operators( id SERIAL PRIMARY KEY, name TEXT NOT NULL, production_studio_id INTEGER NOT NULL REFERENCES production_studios ); CREATE TABLE news_footages( id SERIAL PRIMARY KEY, description TEXT NOT NULL, journalist_id INTEGER NOT NULL REFERENCES journalists, camera_operator_id INTEGER NOT NULL REFERENCES camera_operators ); 

This scheme forms a beautifully shaped diamond ERD and a few questions.

The problem is that news frames can associate a journalist with a camera operator who comes from different production studios. I understand that this can be cured by writing the appropriate restrictions, but for the sake of experiment, let it pretend that we are doing an exercise in designing a normal-form database.

  • The first question is about terminology: is it right to say that this scheme is denormalized? If so, what normal form violates it? Or is there a better name for this anomaly, for example, interrecord redundancy, multipath relationships, etc.?

  • How can this scheme be changed to make the described anomaly impossible?

And, of course, I really appreciate links to documents devoted to this particular problem.

+8
sql database schema normalization
source share
1 answer

A naive way would be to make your journalists and camera_operators dependent objects dependent on the studio they work for. This means that the foreign key of the production studio becomes part of their primary key. Your news_footage table has a primary key consisting of 4 components:

  • production_studio_id
  • journalist_id
  • camera_operator_id
  • footage_id

and two foreign keys:

  • journalist_id, production_studio_id, pointing to the journalist table and
  • camera_operator, production_studio_id, pointing to the camera operator table

Easy.

Or not. Now you have defined in your ER model the notion that the very existence of a camera operator or a journalist depends on the studio for which they work. This does not reflect very well on real work: in this model people cannot change their employer.

We will not do this.

In your original model, you confuse the person with whom they play (a journalist or a camera operator), and you lack several transitional objects that are actually responsible for the production of your news: [for the studio] production group.

My ER model will look something like this:

 create table studio ( id int not null primary key , title varchar(200) not null , ) create table person ( id int not null primary key , title varchar(200) not null , ) create table team ( studio_id int not null , journalist_id int not null , camera_operator_id int not null , primary key ( studio_id , journalist_id , camera_operator ) , foreign key ( studio_id ) references studio ( id ) , foreign key ( journalist_id ) references person ( id ) , foreign key ( camera_operator_id ) references person ( id ) , ) create table footage ( studio_id int not null , journalist_id int not null , camera_operator_id int not null , id int not null , description varchar(200) not null , primary key ( studio_id , journalist_id , camera_operator_id , id ) , foreign key ( studio_id , journalist_id , camera_operator_id ) references team ( studio_id , journalist_id , camera_operator_id ) , ) 

Now you have a world in which people can work in different roles: the same person can be a camera operator in some contexts and a journalist in others. People can change employers. The group includes individual journalists and a camera operator. In some contexts, the same person can play both roles in a team. And finally, frames from news materials are created by one and only one team, specially designed for the studio.

This reflects the real world much better and it is much more flexible.

Edited to add a sample query:

To find journalists working in a particular studio:

 select p.* from studio s join team t on t.studio_id = s.id join person p on p.id = t.journalist_id where s.title = 'my desired studio name' 

This will give you a set of people who (or were) associated with the studio as a journalist. It should be noted, however, that in the real world people work for employers for a certain period of time: to correctly model it, you need a start / end date, and you need to qualify the request with a relative concept.

+3
source share

All Articles