Do SQL indexes satisfy?

Let's say I have a table with a lot of rows, and one of the columns that I want to index can have one of 20 values. If I were to place an index in a column, would it be big?

If so, why? If I divided the data into data into 20 tables, one for each column value the index size would be trivial, but the indexing effect would be the same.

+5
sql indexing partitioning
source share
9 answers

These are not indexes that suck. It puts indexes on the wrong columns that suck.

Seriously, but why do we need a single-column table? What is the meaning of this data? What purpose will he fulfill?

And 20 tables? I suggest you first read the database design or otherwise explain to us the context of your question.

+7
source share

Indexes (or indexes) do not suck. Many very smart people have spent a really wonderful amount of time over the past few decades, ensuring that it is.

Your circuit, however, without the same amount of experience and effort, can suck very badly.

Separation in the described case is equivalent to using a cluster index. If the table is sorted differently (or in random order), then the index must necessarily take up much more space. Depending on the platform, the nonclustered index may decrease as the sorting of rows increases with respect to the indexed value.

YMMV.

+7
source share

Short answer: Do indexes suck: Yes and No

Longer answer: They do not suck when used properly. Perhaps you should start reading about how indexes work, why they can work, and why they sometimes don't work.

Good starting points: http://www.sqlservercentral.com/articles/Indexing/

+3
source share

No indexes suck, but you should pay attention to how you use them, or they can have unpleasant consequences when fulfilling your queries.

First up: layout / design
Why do you need to create a single column table? This is likely to take normalization one step further. Database design is one of the most important things to consider when optimizing performance.

Second: indices
In a nutshell, indexes will help the database perform a binary search of your record. Without an index in a column (or set of columns), the database often reverts to table scans. Scanning a table is very expensive because it involves listing each record.

Actually, it doesn’t matter that the index scans how many records are in the database table. Due to the double doubling of the (balanced) binary tree, the number of entries will result in only one additional search step.

Define the primary key of your table, SQL will automatically place the clustered index in this column (s). Clustered indexes work very well. In addition, you can place non-clustered indexes in columns that are often used in SELECT, JOIN, WHERE, GROUP BY, and ORDER BY statements. Remember that indexes have a certain overlap; never include your clustered index in a nonclustered index.

Also interesting may be the index fill factor. You want to optimize the table for reading (high fill factor - less storage space, less IO) or for writing (low fill factor, more storage, less rebuilding of database pages).

Third: separation
One of the reasons for using partitioning is to optimize data access. Let's say you have 1 million records, of which 500,000 records are no longer relevant, but are stored for archiving purposes. In this case, you can decide to split the table and keep 500,000 old records in slow storage, and the remaining 500,000 records in fast storage.

To measure, you need to know
The best way to get an idea of ​​what is happening is to measure what is happening with your processor and io. Microsoft SQL Server has some tools, such as Profiler and Execution plans in Management Studio, that tell you about the length of your query, the number of read / write operations, and processor usage. The execution plan also indicates which or IF indices are used. To your surprise, you can see a table scan, although you did not expect this.

+3
source share

Let's say I have a table with a lot of rows and one column that I want to index can have one of 20 values. If I were to place an index in a column, would it be big?

The size of the index will be proportional to the number of your rows and the length of the indexed values.

The index stores not only the indexed value, but also some kind of pointer to the string ( ROWID in Oracle , LCID in PostgreSQL , primary key in InnoDB , etc.).

If you have 10,000 rows and 1 excellent value, there will still be 10,000 records in your index.

If so, why? If I divided the data into data into 20 tables, one for each column value, the index size would be trivial, but the indexing effect would be the same

In this case, you would indicate that the 20 indexes are the same size overall than your original.

This method is sometimes used in fact in such named partitioned indexes. It has its advantages and disadvantages.

+2
source share

Standard b-tree indexes are best suited for fairly selective indexes, which would not be the case in this example. You do not say which DBMS you use; Oracle has another type of index, called a raster image index, which is more suitable for low selectivity indexes in OLAP environments (since these indexes are expensive to maintain, which makes them unsuitable for OLTP environments).

The optimizer will determine the basics of statistics, whether he believes that the index will help to obtain data in the fastest time; if he will not, optmiser will not use it.

Separation is another strategy. In Oracle, you can define a table as divided into a set of columns, and for the optimizer, you can automatically perform the "partition removal" as you suggest.

+2
source share

Sorry, I do not quite understand what you mean by "big."

  • If your index is clustered, all data for each record will be on one page of the sheet, thereby creating the most efficient index available for your table if you write your queries correctly.

  • If your index is not clustered, then only the data associated with the index will be displayed on the pages of your sheet. Then, depending on things like the number of other indexes you have, combined with details like your fill factor, your index may or may not be effective. In general, if you do not have tons of indexes on your table, you should be safe.

  • The performance of your index will also be determined by the data type of the 20 values ​​that you are talking about entering the column. If these are predefined values, then their data should probably be in the lookup table with a simple primary key data type (for example, Int / Number). Then add this column to your table as a foreign key with an index in the column.

Ultimately, you can have a perfect index in a column. But best of all, this will be determined mainly by the queries you write. Therefore, if your queries use indexes, you are golden.

+1
source share

Indexes are for performance only. If the index does not improve performance for the queries you are interested in, then this sucks.

Regarding disk usage, you should weigh your concerns. Different SQL providers build indexes in different ways, but as a client, you generally believe that they are doing everything they can to do. In the case you are describing, a clustered index may be optimal for both size and performance.

0
source share

This would be large enough to hold these values ​​for all rows in sorted order.

Say you have 20 different lines of 4 characters and 1 million lines, at least 4 million bytes (or 8 if 16-bit unicode) must be stored for these values.

0
source share

All Articles