How to speed up the current query with index

I am using v12 server in an Azure SQL database and I have the following table:

CREATE TABLE [dbo].[AudienceNiches]( [Id] [bigint] IDENTITY(1,1) NOT NULL, [WebsiteId] [nvarchar](128) NOT NULL, [VisitorId] [nvarchar](128) NOT NULL, [VisitDate] [datetime] NOT NULL, [Interest] [nvarchar](50) NULL, [Gender] [float] NULL, [AgeFrom18To24] [float] NULL, [AgeFrom25To34] [float] NULL, [AgeFrom45To54] [float] NULL, [AgeFrom55To64] [float] NULL, [AgeFrom65Plus] [float] NULL, [AgeFrom35To44] [float] NULL, CONSTRAINT [PK_AudienceNiches] PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) )

I execute this query: (UPDATED REQUEST)

 `select a.interest, count(interest) from ( select visitorid, interest from audienceNiches WHERE WebsiteId = @websiteid AND VisitDate >= @startdate AND VisitDate <= @enddate group by visitorid, interest) as a group by a.interest` 

And I have the following indexes (all ASC):

idx_WebsiteId_VisitDate_VisitorId idx_WebsiteId_VisitDate idx_VisitorId idx_Interest

The problem is that my query returns 18K aproximaly rows and takes 5 seconds, the whole table has 8.8M records, and if I expand the data a bit, the time increases, so what would be the best index for this query? What am I missing?

+6
source share
5 answers

The best index for this query is the composite index in these columns in the following order:

  • WebsiteId
  • VisitDate
  • Interest
  • VisitorId

This allows you to fully respond to the request from the index. SqlServer can scan in the range ( WebsiteId , VisitDate ), and then exclude null Interest and finally count various VisitorIds all from the index. Index entries will be in the correct order so that these operations can be performed efficiently.

+2
source

It's hard for me to write SQL without the data to test, but see if this gives the results you are looking for with better lead time.

 SELECT interest, count(distinct visitorid) FROM audienceNiches WHERE WebsiteId = @websiteid AND VisitDate between @startdate and @enddate AND interest is not null GROUP BY interest 
+2
source

Indexes may require an almost infinite understanding, but in your case, I think you will see good performance indicators by indexing SiteId and VisitDate as separate indexes.

It is important that your indexes are in good shape. You must maintain them by keeping statistics up to date and periodically rebuilding your indexes.

Finally, you should study the query plan when tuning query performance. SQL Server will tell you if it thinks it will be useful for indexing a column (or columns), and it will also alert you to other performance issues.

Press Ctrl + L from Management Studio and see what happens to the request.

+1
source

Your query can be written this way because in the final result set you do not pop out the visitor column from the Niches audience table, so you do not need to write two different levels of the group. Check this request and let me know if it is still facing a performance issue.

 select interest, count(interest) from audienceNiches WHERE WebsiteId = @websiteid AND VisitDate >= @startdate AND VisitDate <= @enddate group by interest 
+1
source

Firstly, your updated request can be effectively reduced to this:

 select an.Interest, count(an.Interest) from dbo.AudienceNiches an where an.WebsiteId = @WebSiteId and an.VisitDate between @startdate and @enddate group by an.Interest; 

Secondly, depending on the power of your data, one of the following indexes will provide the best performance:

 create index IX_AudienceNiches_WebSiteId_VisitDate_Interest on dbo.AudienceNiches (WebSiteId, VisitDate, Interest); 

or

 create index IX_AudienceNiches_VisitDate_WebSiteId_Interest on dbo.AudienceNiches (VisitDate, WebSiteId, Interest); 
However, as your data grows, I think that ultimately the latter will become more efficient, on average.

PS Your table is heavily denormalized in several aspects. I hope you know what you are doing.

0
source

All Articles