SQL sorting, paging, filtering best practices in ASP.NET

I am wondering how Google does this. I have a lot of slow queries when it comes to the number of pages and the total number of results. Google returns an invoice value of 250,000.00 in a split second.

I am dealing with grid views. I created my own pager for gridview, which requires the SQL query to return the number of pages based on filters set by the user. Filters of at least 5, which include the keyword, category and subcategory, date range filter and sort filter for sorting for sorting. The query contains about 10 massive table left joins.

This query is executed every time a search is performed, and the query is executed on average 30 seconds - whether it is counting or selection. I believe this slows down, this is my query string for inclusive and exclusive date range filters. I replaced (<=,> =) with BETWEEN AND AND, but still I am experiencing the same problem.

See the request here: http://friendpaste.com/4G2uZexRfhd3sSVROqjZEc

I have problems with a long range parameter.

Check the table containing the dates: http://friendpaste.com/1HrC0L62hFR4DghE6ypIRp

UPDATE [9/17/2010] I minimized my request for a date and deleted the time. I tried to reduce the connections for my counting request (I actually have a problem with my filter counter, which takes a lot of time to return a result of 60 thousand lines).

SELECT COUNT(DISTINCT esched.course_id) FROM courses c LEFT JOIN events_schedule esched ON c.course_id = esched.course_id LEFT JOIN course_categories cc ON cc.course_id = c.course_id LEFT JOIN categories cat ON cat.category_id = cc.category_id WHERE 1 = 1 AND c.course_type = 1 AND active = 1 AND c.country_id = 52 AND c.course_title LIKE '%cook%' AND cat.main_category_id = 40 AND cat.category_id = 360 AND ( (2010-09-01' <= esched.date_start OR 2010-09-01' <= esched.date_end) AND ('2010-09-25' >= esched.date_start OR '2010-09-25' >= esched.date_end) ) 

I just noticed that my query is pretty fast when I have a filter in my main or subcategories. However, when I only have a date filter, and the range is a month or a week, it should count a lot of rows and it will execute on average in 30 seconds.

These are static fields:

 AND c.course_type = 1 AND active = 1 AND c.country_id = 52 

UPDATE [9/17/2010] If you create a hash for these three fields and save it on one field, will it change the speed?

These are my dynamic fields:

 AND c.course_title LIKE '%cook%' AND cat.main_category_id = 40 AND cat.category_id = 360 // ?DateStart and ?DateEnd 

UPDATE [9/17/2010]. Now my problem is the leading% in the LIKE request

will post updated explanation

+4
source share
2 answers

Search engines such as Google use very sophisticated hidden algorithms to index search queries. In fact, they have already determined which words appear on each page, as well as the relative importance of these words and the relative importance of the pages (compared to other pages). These indexes are very fast because they are based on Bitwise Indexing .

Consider the following Google searches:

 custom : 542 million google hits pager : 10.8 m custom pager 1.26 m 

Essentially, they created an entry for the word custom, and in this entry they put 1 for every page that contains it, and 0 for every page that doesn't contain it. Then they fasten it, because there is much more than 0 than 1 s. They do the same for the pager.

When the custom pager search comes in, they unzip both entries, perform bitwise And on them, and this leads to an array of bits, where length is the total number of pages they indexed, and the number 1s represents the number of hits to search. The position of each bit corresponds to a specific result that is known in advance, and they only need to view the details of the first 10 to display on the first page.

This is simplified, but it is a general principle.

Oh yes, they also have huge server banks that perform indexing and huge server banks that meet search queries. HUGE server banks!

This makes them much faster than anything that can be done in a relational database.

Now, to your question: could you insert an SQL sample for viewing?

One thing you can try is to change the display order of tables and joins in your SQl statement. I know that it doesn't seem to matter, but it certainly can. If you put the most restrictive joins earlier in the instruction, then you may well get fewer common joins performed in the database.

An example of the real world. Let's say you wanted to find all the entries in the phone book under the name "Johnson", whose number began with "7". One way is to find all the numbers starting with 7, and then join them with numbers belonging to people called Johnson. In fact, it would be much faster to filter in a different way, even if you have indexing by both names and numbers. This is because the name "Johnson" is more restrictive than the number 7.

Thus, the order is counted, and the datbase software does not always determine well in advance what is combined to execute in the first place. I'm not sure about MySQL, since my experience is mainly with SQL Server, which uses index statistics to calculate which order to perform joins. These statistics become outdated after several insertions, updates, and deletions, so they need to be recalculated periodically. If MySQL has something similar, you can try this.

UPDATE I looked at the query you posted. Ten left joins are not unusual and should work fine if you have the right indexes. Yours is not a complicated request.

What you need to do is break this request down into its basic principles. Comment on search joins, such as those in currencies, exchange rates, countries, states, and cities, as well as the corresponding fields in the select statement. Does it still work just as slowly? Probably no. But he is probably not perfect yet.

So, comment on everything else until you have courses and groups at the rate id and order by courseid. Then, experiment with the addition in the left join to see which one has the greatest impact. Then, focusing on those that have the greatest impact on performance, reorder the queries. This is a trial and error method. It would be much better to look at the indexes of the columns you enter.

For example, for the line cm.method_id = c.method_id will need a primary key for course_methodologies.method_id and a foreign key index for .method_id courses, etc. In addition, all fields in cases where, by groups and by orders need indexes.

Good luck.

UPDATE 2 You should seriously look at filtering dates for this query. What are you trying to do?

  AND ((('2010-09-01 00:00:00' <= esched.date_start AND esched.date_start <= '2010-09-25 00:00:00') OR ('2010-09-01 00:00:00' <= esched.date_end AND esched.date_end <= '2010-09-25 00:00:00')) OR ((esched.date_start <= '2010-09-01 00:00:00' AND '2010-09-01 00:00:00' <= esched.date_end) OR (esched.date_start <= '2010-09-25 00:00:00' AND '2010-09-25 00:00:00' <= esched.date_end))) 

It can be rewritten as:

 AND ( //date_start is between range - fine (esched.date_start BETWEEN '2010-09-01 00:00:00' AND '2010-09-25 00:00:00') //date_end is between range - fine OR (esched.date_end BETWEEN '2010-09-01 00:00:00' AND '2010-09-25 00:00:00') OR (esched.date_start <= '2010-09-01 00:00:00' AND esched.date_end >= '2010-09-01 00:00:00' ) OR (esched.date_start <= '2010-09-25 00:00:00' AND esched.date_end > = '2010-09-25 00:00:00') ) 
+3
source

in your update, you mention that you suspect that the problem is related to date filters.

All of these date checks can be summarized with a single check:

 esched.date_ends >= '2010-09-01 00:00:00' and esched.date_start <= '2010-09-25 00:00:00' 

If with the above, it behaves the same way, check to see if your indexes are fast / quick:

SELECT COUNT (DISTINCT esched.course_id) FROM events_schedule esched WHERE esched.date_ends> = '2010-09-01 00:00:00' and esched.date_start <= '2010-09-25 00:00:00'

ps I think that when using the connection, you can do SELECT COUNT ( c.course_id ) to calculate the main course records in the query directly, that is, it may not be needed unlike this.


Re-updating is now best suited for finding a wild card after a change:

Use full mysql text search . Do not forget to check fulltext-restrictions , it is important that it is supported only in MyISAM tables. I have to say that I really did not use mysql full-text search, and I'm not sure how this affects the use of other indexes in the query.

If you cannot use full-text search, IMHO you will not be able to use your current approach to it, i.e. since he cannot use a regular index to check if his word is contained in any part of the text.

If this happens, you may want to switch this particular part of the approach and introduce a tag / keyword based approach. Unlike categories, you can assign several to each element, so its flexibility does not have a problem with free text so far.

+2
source

All Articles