Compound index equivalent for multiple tables?

Question

Compound index equivalent for multiple tables?

I have a table structure similar to the following:

create table MAIL ( ID int, FROM varchar, SENT_DATE date ); create table MAIL_TO ( ID int, MAIL_ID int, NAME varchar );

and I need to run the following query:

 select m.ID from MAIL m inner join MAIL_TO t on t.MAIL_ID = m.ID where m.SENT_DATE between '07/01/2010' and '07/30/2010' and t.NAME = 'someone@example.com'

Is there a way to design indexes so that both conditions can use an index? If I put the index in MAIL.SENT_DATE and the index in MAIL_TO.NAME, the database decides to use either one of the indexes or the other. After filtering by the first condition, the database should always perform a full scan of the results for the second condition.

+7

sql oracle

jthg Jul 30 '10 at 17:23

source share

5 answers

Oracle can use both indexes. You simply do not have two two indexes.

Consider: if the query plan first uses your mail.sent_date , what does it get from mail ? It receives all mail.id , where mail.sent_date is within the range that you gave in your where clause, yes?

So, it goes to mail_to with a list of mail.id and mail.name that you specified in your where clause. At this point, Oracle decides that it is better to scan the table to match mail_to.mail_id rather than using the index on mail_to.name .

Varchars indexes are always problematic, and Oracle really prefers a full table scan. But if we give Oracle an index containing the columns that it really wants to use, and depending on the common rows of the table and statistics, we can force it to use. This is the index:

  create index mail_to_pid_name on mail_to( mail_id, name ) ;

This works where the index only on name does not work, because Oracle does not search only for the name, but for mail_id and name .

Conversely, if a cost-based analyzer determines that it’s cheaper to go to the mail_to table mail_to and use your mail_to.name , what should I do while sitting? Bouquet mail_to_.mail_id for search in mail . It should find strings with these identifiers and some sent_dates, therefore:

  create index mail_id_sentdate on mail( sent_date, id ) ;

Please note that in this case I put sent_date first in the index, and id on the second. (This is a more intuitive thing.)

Again, the main point of arrival is this: when creating indexes, you need to consider not only the columns in your where clause, but also the columns in your join conditions.

Update

jthg: yes, it always depends on how the data is distributed. And how many rows are in the table: if there are too many, Oracle will scan the table and hash join, if very few will scan the table. You can reorder any of the two indexes. By placing sent_date in the second index, we eliminate most of the needs for the index only on sent_date .

+7

tpdi Jul 30 '10 at 17:46

source share

Which criterion is more selective? Date range or destination? I would suggest the addressee. And if it’s very selective, don’t worry about the date index, just let the database search based on the email identifiers found. But the MAIL index table is id if not already specified.

On the other hand, some modern optimizers even use both indexes, scan both tables and build the hash value of the connection columns to combine the results of both. I'm not quite sure when and when Oracle will choose this strategy. I just realized that SQL Server tends to make hash joins quite often compared to other engines.

0

Frank Jul 30 '10 at 17:35

source share

If your queries usually refer to a specific month, you can partition data by month.

0

Marcus adams Jul 30 '10 at 17:48

source share

In situations where the requirements are not met for a materialized view, the following two options exist:

1) You can create a cross-reference table and save this update using triggers.

The concepts will be the same with Oracle, but at the moment I just have SQL Server installed to run the test, see this setting:

 create table MAIL ( ID INT IDENTITY(1,1), [FROM] VARCHAR(200), SENT_DATE DATE, CONSTRAINT PK_MAIL PRIMARY KEY (ID) ); create table MAIL_TO ( ID INT IDENTITY(1,1), MAIL_ID INT, [NAME] VARCHAR (200), CONSTRAINT PK_MAIL_TO PRIMARY KEY (ID) ); ALTER TABLE [dbo].[MAIL_TO] WITH CHECK ADD CONSTRAINT [FK_MAILTO_MAIL] FOREIGN KEY([MAIL_ID]) REFERENCES [dbo].[MAIL] ([ID]) GO ALTER TABLE [dbo].[MAIL_TO] CHECK CONSTRAINT [FK_MAILTO_MAIL] GO CREATE TABLE CompositeIndex_MailSentDate_MailToName ( [MAIL_ID] INT, [MAILTO_ID] INT, SENT_DATE DATE, MAILTO_NAME VARCHAR(200), CONSTRAINT PK_CompositeIndex_MailSentDate_MailToName PRIMARY KEY (MAILTO_ID,MAIL_ID) ) GO CREATE NONCLUSTERED INDEX IX_MailSent_MailTo ON dbo.CompositeIndex_MailSentDate_MailToName (SENT_DATE,MAILTO_NAME) CREATE NONCLUSTERED INDEX IX_MailTo_MailSent ON dbo.CompositeIndex_MailSentDate_MailToName (MAILTO_NAME,SENT_DATE) GO CREATE TRIGGER dbo.trg_MAILTO_Insert ON dbo.MAIL_TO AFTER INSERT AS BEGIN INSERT INTO dbo.CompositeIndex_MailSentDate_MailToName ( MAIL_ID, MAILTO_ID, SENT_DATE, MAILTO_NAME ) SELECT mailTo.MAIL_ID,mailTo.ID,m.SENT_DATE,mailTo.NAME FROM inserted mailTo INNER JOIN dbo.MAIL m ON m.ID = mailTo.MAIL_ID END GO CREATE TRIGGER dbo.trg_MAILTO_Delete ON dbo.MAIL_TO AFTER DELETE AS BEGIN DELETE mailToDelete FROM dbo.MAIL_TO mailToDelete INNER JOIN deleted ON mailToDelete.ID = deleted.ID END GO CREATE TRIGGER dbo.trg_MAILTO_Update ON dbo.MAIL_TO AFTER UPDATE AS BEGIN UPDATE compositeIndex SET compositeIndex.MAILTO_NAME = updates.NAME FROM dbo.CompositeIndex_MailSentDate_MailToName compositeIndex INNER JOIN inserted updates ON updates.ID = compositeIndex.MAILTO_ID END GO CREATE TRIGGER dbo.trg_MAIL_Update ON dbo.MAIL AFTER UPDATE AS BEGIN UPDATE compositeIndex SET compositeIndex.SENT_DATE = updates.SENT_DATE FROM dbo.CompositeIndex_MailSentDate_MailToName compositeIndex INNER JOIN inserted updates ON updates.ID = compositeIndex.MAIL_ID END GO INSERT INTO dbo.MAIL ( [FROM], SENT_DATE ) SELECT 'SenderA','2018-10-01' UNION ALL SELECT 'SenderA','2018-10-02' INSERT INTO dbo.MAIL_TO ( MAIL_ID, NAME ) SELECT 1,'CustomerA' UNION ALL SELECT 1,'CustomerB' UNION ALL SELECT 2,'CustomerC' UNION ALL SELECT 2,'CustomerD' UNION ALL SELECT 2,'CustomerE' SELECT * FROM dbo.MAIL SELECT * FROM dbo.MAIL_TO SELECT * FROM dbo.CompositeIndex_MailSentDate_MailToName

Then you can use the dbo.CompositeIndex_MailSentDate_MailToName table for JOIN for the rest of your data. This is useful in environments where the speed of attachments and updates is low, but your requests are high. Therefore, the relative overhead during the implementation of the triggers is small.

The advantage of this is that it is updated in real time.

2) If you do not want overhead for performance / trigger management, and you only need it for next day reports, you can create a view and a night process that cuts the table and selects the whole view in a materialized table,

I have successfully used this to index flattened relational data that requires joining across a dozen or so tables. Reduce response time from hours to a few seconds. Although this is a costly request, you can set the task to run without hours if you have periods of reduced usage.

0

Sean Dec 7 '18 at 19:58

source share

OMG Ponies · Accepted Answer · 2010-07-30T17:28:41+0000

A materialized view will allow you to index values, assuming that you adhere to strict materialized presentation criteria.

Compound index equivalent for multiple tables?

More articles: