We developed a system with a search screen that looks something like this:

(source: nsourceservices.com )
As you can see, there is a pretty serious search functionality. You can use any combination of statuses, channels, languages, campaign types, and then narrow them down by name, etc.
Then, as soon as you perform a search, and a window appears at the bottom, you can sort the headers.
The query uses ROWNUM to create a pagination scheme, so we only return about 70 rows at a time.
Problem
Despite the fact that we return only 70 lines, there are a lot of input-output and sorting operations. That makes sense, of course.
This always caused small spikes in the disk queue. It started to slow down even more when we reached 3 million leads, and now, when we are approaching 5, the disk queue is sometimes attached to one or two seconds in a row.
This is actually still workable, but there is another area in this system with a time-sensitive process, letβs say for simplicity that this is a web service that should respond very quickly, otherwise it will time out at the other end. Peaks in the disk queue slow down this part, which leads to an excess of latency downstream. The end result is actually missed phone calls in our automatic VoiceXML-based IVR, and this is very bad for us.
What we tried
We tried:
- Maintenance tasks that reduce the number of leads in the system to a minimum.
- Obvious indexes added to help.
- I launched the index setup wizard in the profiler and applied most of its suggestions. One of them was going to more or less reproduce the entire table inside the index, so I set it up manually to make it a little smaller.
- Added more RAM to the server. It was a bit low, but now it always has something like 8 gigabytes of inactivity, and the SQL server is configured to use no more than 8 gigabytes, however it never uses more than 2 or 3. I found this strange. Why not just put the whole table in RAM? This is only 5 million leads and there is a lot of space.
- Poured plans to fulfill the request. I see that at the moment, indexes mostly do their job - about 90% of the work is done at the sorting stage.
- Partitioning the Leads table to another physical disk was considered, but we do not have the resources for this, and it seems that this is not necessary.
In closing ...
Part of me feels that the server should be able to handle this. Five million records is not so much, considering the power of this server, which is a good quad core with 16 gigabytes of RAM. However, I can see how part of the sorting causes a touch of millions of rows to return a handful.
So what did you do in such situations? My instinct is that we may need to reduce some functionality, but if there is a way to keep it intact, it will save me from a war with the business unit.
Thanks in advance!