Confusing Database Schema (Index and Constraints)

I have a little confusion about circuit design, but before I start, let me first show you the circuit,

CREATE TABLE Person ( PersonID INT NOT NULL PRIMARY KEY, FirstName VARCHAR(50), LastName VARCHAR(50), -- some columns here.. CONSTRAINT tb_idF INDEX (FirstName), CONSTRAINT tb_idL INDEX (LastName) -- or -- CONSTRAINT tb_idL INDEX (FirstName, LastName) -- other constraints ... ); CREATE TABLE JobDescription ( JobDescriptionID INT NOT NULL PRIMARY KEY, JobDescriptionName VARCHAR(50) UNIQUE -- some columns here.. -- constraints ... ); 

and confusion here, the mapping table for the tables: Person and JobDescription . I currently have this design,

 CREATE TABLE Person_JobDescription ( RECID INT AUTO_INCREMENT PRIMARY KEY, -- for some special reasons -- I need to map to other table PersonID INT, JobDescriptionID INT, StartYear INT, -- year JobDescription was Appointed EndYear INT, CONSTRAINT tb_fk1 FOREIGN KEY (PersonID) REFERENCES Person(PersonID), CONSTRAINT tb_fk2 FOREIGN KEY (JobDescriptionID) REFERENCES JobDescription(JobDescriptionID), CONSTRAINT tb_uq UNIQUE (PersonID, JobDescriptionID) ); 

but I have another idea whose structure will look like this:

 CREATE TABLE Person_JobDescription ( PersonID INT, -- map these two columns on the other table JobDescriptionID INT, -- ^^ StartYear INT, -- year JobDescription was Appointed EndYear INT, CONSTRAINT tb_fk1 FOREIGN KEY (PersonID) REFERENCES Person(PersonID), CONSTRAINT tb_fk2 FOREIGN KEY (JobDescriptionID) REFERENCES JobDescription(JobDescriptionID), CONSTRAINT tb_pk PRIMARY KEY (PersonID, JobDescriptionID) ); 

when I created and tested the query compared to the tables above, both of them return the same results, and the performance is also the same as I test in a small database (having records of 50 thousand records). I wonder how the two queries in a large database behave.

QUESTIONS

  • Which of the two schema mapping tables ( Person_JobDescription ) do you prefer in a large database?

As stated, I am not allowed to create a UNIQUE on FirstName and LastName . But I put a pointer to two columns.

  • What type of index will I use in the Person table? Index for each column or join index for FirstName and LastName ?
  • When will I use the same index INDEX (Col1) and INDEX (Col2) over INDEX (Col1, Col2) ?

Thanks for taking the time to read this question.

Best wishes,

Derek Floss

+8
sql database mysql schema
source share
2 answers

I would prefer a second approach. Using surrogate identification numbers when they are not logically necessary for identification, you enter more mandatory associations. This requires you to “chase IDs throughout the database,” which is the SQL equivalent for “chaining pointers throughout the database.” Dot pointers were characteristic of IMS, one of the database architectures, the relational model for replacement. (IMS uses a hierarchical architecture.) There is no point in inventing it today. (Although many people do just that.)

If you have, for example, five levels of surrogate identification numbers, and you want the name of a person, you need to make four connections to receive it. Using the second approach, you just need one connection. If you do not want to write multi-column connections, use CREATE VIEW and do it only once.

Performance is easy to test . Simply generate several million random strings using your favorite scripting language and upload them to the test server. You will not only find where performance problems are hidden, you will find all the errors in the CREATE TABLE code. (Your code will not work as it is.) Learn about EXPLAIN if you do not already know about it.

As for indexing , you can check this on randomly selected lines that you create and load. A multi-column index (first_name, last_name) will work best if users always have the first name. But many users will not do this, preferring instead to search by last name. Multiple column index (first_name, last_name) is not valid for users who prefer to search by last name. You can check it out.

For this reason, indexing names and surnames is usually more efficient if there are two separate indexes: one for the first name and one for the last name.


What do id check numbers mean?

The unspoken design pattern underlying this question is "Each line must have an id number, and all foreign keys must reference an identifier number." In a SQL database, this is actually an anti-pattern. As a rule, any template that allows you to create tables without thinking about the keys should be considered guilty until it is innocent - it should be considered an anti-template until it is proved that it will not.

 create table A ( a_id integer primary key, a_1 varchar(15) not null unique, a_2 varchar(15) not null ); create table B ( b_id integer primary key a_id integer not null references A (a_id), b_1 varchar(10) not null, unique (a_id, b_1), ); create table C ( c_id integer primary key, b_id integer not null references B (b_id), c_1 char(3) not null, c_2 varchar(20) not null, unique (b_id, c_1) ); create table D ( d_id integer primary key, c_id integer not null references C (c_id), d_1 integer not null, d_2 varchar(15), unique (c_id, d_1) ); 

If you need a report in table "D", and the report needs

  • columns D.d_1 and D.d_2 and
  • columns A.a_1 and A.a_2,

you need 3 connections to get to it. (Try it.) You are chasing ID numbers. (Similar to prosecution pointers in IMS.) The following structure is different.

 create table A ( a_1 varchar(15) primary key, a_2 varchar(15) not null ); create table B ( a_1 varchar(15) not null references A (a_1), b_1 varchar(10) not null, primary key (a_1, b_1), ); create table C ( a_1 varchar(15) not null, b_1 varchar(10) not null, c_1 char(3) not null, c_2 varchar(20) not null, primary key (a_1, b_1, c_1), foreign key (a_1, b_1) references B (a_1, b_1) ); create table D ( a_1 varchar(15) not null, b_1 varchar(10) not null, c_1 char(3) not null, d_1 integer not null, d_2 varchar(15), primary key (a_1, b_1, c_1, d_1), foreign key (a_1, b_1, c_1) references C (a_1, b_1, c_1) ); 

In this structure, one connection is required for the same report.

 select D.d_1, D.d_2, A.a_1, A.a_2 from D inner join A on D.a_1 = A.a_1; 
+5
source share

The first approach will be my preference

If you need a table depending on PersonJobDescription, say AgentContact, you can easily reference the surrogate Rec_ID, without it you need to start jumping through the hoops

Another reason would be if you had to keep Person / JobDescription for every year? Before you know where you are, you will have four vakue composite keys that still do not do the job. The Compound Primary Keys rule should be the last to make your projects more flexible and resilient.

+3
source share

All Articles