Index above a column with 5 different values - Is it worth it?

Question

Index above a column with 5 different values - Is it worth it?

I have a table with a potential of up to 5,000,000 rows. One of the columns in this table is used only once in queries, but there are only 5 possible values for this column, and currently I have received 10,000 rows, and according to the plan of explanation, it makes no sense to use my index in this column.

Will it ever be, or should I not worry about the index

Edit: These are two explanatory plans at the moment. Without index http://img706.imageshack.us/img706/1903/noindex.png versus With forced index using hints http://img692.imageshack.us/img692/8205/indexp .png Last image I enforce an index with a tooltip.

+7

oracle indexing

svrist Dec 10 '09 at 8:56

source share

5 answers

The index will be useful in the following cases:

When searching for infrequent FREQUENCYID . For example, only 10 of your 10,000,000 rows have FREQUENCYID = 1 and you are looking for it.
If you do not use columns other than FREQUENCYID in your queries. This request:
```
 SELECT FREQUENCYID, COUNT(*) FROM mytable GROUP BY FREQUENCYID 
```
will benefit from the index (in fact, INDEX FAST FULL SCAN together with HASH AGGREGATE will most likely be used)
When the rows of the table are large, and all the columns that you use in the query are indexed. Thus, all indexes will be merged instead of creating a FULL TABLE SCAN . Let's say this query:
```
 SELECT FREQUENCYID, OTHERCOLUMN FROM mytable WHERE FREQUENCYID = 2 
```
can be done by combining values from indices on FREQUENCYID and OTHERCOLUMN on ROWID .

+2

Quassnoi Dec 10 '09 at 15:21

source share

If you want to increase the size by specifying

up to 5,000,000 lines

I would recommend creating an index.

+1

Adriaan stander Dec 10 '09 at 8:59

source share

Perhaps the easiest way is not to guess, but actually try.

But it seems to me that you are comparing execution plans to find a better approach. It is not reliable. The Optimizer may not have the appropriate information to choose the best plan (for example, if you have an uneven distribution of values and no histogram). It also makes no sense to look at the “cost” in terms of explanation.

It is best to compare logical IOs. Run SQL * Plus, say set autotrace traceonly , then run your query (with and without an index) and compare the number of "consonants gets." Less is better.

About the importance of LIO: an article by Carey Millsap .

+1

Egor rogov Dec 10 '09 at 12:19

source share

Test it with typical queries, see which path is faster.

You may find that scanning a full table is faster on average than scanning by a range index + access to a table by Rowid - in this case, Oracle got this right.

On the other hand, there may be data patterns that it is better to use an index for most of your queries - in this case, you probably want to add the INDEX hint.

0

Jeffrey kemp Dec 10 '09 at 14:07

source share

David aldridge · Accepted Answer · 2009-12-10T10:26:45+0000

It depends on a few things.

First, the distribution of values. If you have only five different values, but one of them is 99.9999% of the rows in the table, then obviously you do not want the optimizer to use the index for this value, but you can use it for others. In some cases, like this, it's worth using a function-based index to make sure that you index only the values of interest, and not those that just take up space.

Secondly, are there any queries that can be answered using this index without access to the table?

Please note that this is not only the percentage of rows that will be accessed, but the number of table blocks that will need to be accessed. For example, if on average you have a table of 1000 blocks and 30 rows per block, and one column has 30 different values (each of which is present in 1000 rows), then the number of blocks you need to visit to read each row for one value varies from 1000/30 = 34 (you should use the index) and 1000 (you should not use the index) depending on how the rows are distributed. this is expressed by the index clustering coefficient - if the value is close to the number of rows in the table, then the index is less likely to be used, and if it is close to the number of blocks, then it is most likely to be used.

You can also look at index compression to see if this saves your space.

Be careful with raster image indices - they are not system-friendly in which they can be changed by several sessions at the same time (for example, two people insert rows into an indexed table at the same time).

A more efficient strategy, if you want to increase the efficiency of queries with predicates for these five values, is to use partitioning, partly because of the clipping of the section in the query, and also because of the improved statistics available to the optimizer, when it knows that only one partition and can use partition level statistics instead of global statistics.

Index above a column with 5 different values ​​- Is it worth it?

More articles:

Index above a column with 5 different values - Is it worth it?