You must read and understand the basics of normalization . For most projects, normalizing to the 3rd normal form will be fine. There are always certain scenarios when you want to normalize more or less, but understanding the concepts underlying it will allow you to think about how your database is structured in a normalized format.
here is a very simple example of table normalization:
students student_id student_name student_class student_grade
a pretty standard table containing various data, but we can immediately see some problems. we can see that the studentβs name depends on its identifier, however, the student can participate in several classes, and each class may have a different class. we can then think of tables as such:
students student_id student_name class class_id class_name
This is not bad, now we can see that we have different students and different classes, but we did not study students' grades.
grades student_id class_id grade
we now have a third table that allows us to understand the relationship between a particular student, a specific class, and the class associated with that class. from our first source table we now have 3 tables in the normalized database (let's say that we do not need to normalize the estimates for example :))
A few things we can learn from this very simple example:
- our data is bound to a specific key (student_id, class_id and student_id + class_id). these are unique identifiers in each table.
- with our key relationships, we can relate information to each other (how many classes do students in number 4096 study in?)
- we can see that our tables now do not contain duplicate data (think of our first table, where student_class can be the same value for many students. If we had to change the class name, we would have to update all records in our normalized format we can just update class_name of class_id)
Owen
source share