A colleague asked me to explain how indexes (indexes?) Increase productivity; I tried to do it, but I was embarrassed.
I used the model below to explain (error logging / diagnostic database). It consists of three tables:
- List of business systems, table "System" containing their names
- List of different types of traces, table "TraceTypes", which determines which error messages can be logged
- Actual trace messages that have foreign keys from the
System and TraceTypes
I used MySQL for demonstration, however I do not remember the types of tables that I used. I think it was InnoDB.
System TraceTypes ----------------------------- ------------------------------------------ | ID | Name | | ID | Code | Description | ----------------------------- ------------------------------------------ | 1 | billing | | 1 | Info | Informational mesage | | 2 | hr | | 2 | Warning| Warning only | ----------------------------- | 3 | Error | Failure | | ------------------------------------------ | ------------| Traces | | -------------------------------------------------- | ID | System_ID | TraceTypes_ID | Message | -------------------------------------------------- | 1 | 1 | 1 | Job starting | | 2 | 1 | 3 | System.nullr..| --------------------------------------------------
First, I added some records to all the tables and demonstrated that the query below is executed in 0.005 seconds:
select count(*) from Traces inner join System on Traces.System_ID = System.ID inner join TraceTypes on Traces.TraceTypes_ID = TraceTypes.ID where System.Name='billing' and TraceTypes.Code = 'Info'
Then I generated more data (no indexes yet)
- "System" contains about 100 entries
- "TraceTypes" contains about 50 entries
- Traces contain ~ 10 million entries.
Now the previous request took 8-10 seconds.
I created indexes in the Traces.System_ID column and the Traces.TraceTypes_ID column. Now this request is executed in milliseconds:
select count(*) from Traces where System_id=1 and TraceTypes_ID=1;
It was also fast:
select count(*) from Traces inner join System on Traces.System_ID = System.ID where System.Name='billing' and TraceTypes_ID=1;
but the previous query, which joined all three tables, still took 8-10 seconds.
Only when I created a composite index (both System_ID and TraceTypes_ID columns included in the index) did the speed drop to milliseconds.
The main statement I taught earlier is "all the columns that you use to join must be indexed."
However, in my scenario, I had indexes for both System_ID and TraceTypes_ID , however MySQL did not use them. The question is why? My bets - the ratio of the number of copies 100: 10 000 000: 50 makes too large indices with one column. But is this true?
performance join mysql indexing
naivists
source share