Question about Solr Caching Mechanism

I am working on an Apache Solr project. (distributed in the cloud - instances of Amazon ec2).

I noticed that Solr does a great job caching results. When I repeat the same queries - Solr QTime reaction states are 0 or 1 millisecond.

I want to emphasize the testing of the Solr system. Therefore, I have a limited list of queries that I could use (50,000 unique queries). The problem is that all requests are cached!

When I stress test - after 5 minutes or so - all my requests are given in Solr and are executed. This makes the system sweat a heavy load :) (after all, that was the goal). But then when I execute the same query again, QTime is almost zero! -> Solr has an easy time and is not loaded.

My question is: How can you turn all Solr caches (both Solr and Lucence caches)? Or how can you limit the cache?

I tried turning the entire Solr cache, but the cache still remains. (QueryResultCache and FieldCache) Note: The config mentions that Lucence will manage the internal cache - maybe this cache is a problem?

It's just weird that all 50,000 requests can be cached out of the box.

+6
solr
source share
2 answers

You can comment on filterCache, queryResultCache and documentCache in your configuration. Lucene FieldCache cannot be disabled .

Although in fact it does not make any sense, even for benchmarking. Would you also disable disk caching on your operating system? Processor cache (all three levels)? The internal cache of each hard drive?

Caches are part of the system, if you disable them, you will not accurately simulate what happens during the production process, thereby making this test useless.

+6
source share

Disabling caches is a great idea, at least those that are application specific. In this case, it is recommended to conduct a comparative test to find the answer / cost of the request, which was not previously noticed; unlike those that are popular over the life of the cache.

It looks like you want the indicators to show how the search engine works; not a query cache.

The previous answers are indeed from the left margin, suggesting that all benchmarks should measure the same thing, β€œyour own definition ofβ€œ real life. ”This is not how engineers work.

Regarding the remark about "disk caches." Linux does not have disk caches; page cache only; whether this page is stored on disk, pre-allocations for large file systems that are smart .... are created and destroyed in memory, all pages.

There are advantages to benchmarking with caches ... if you want to measure cache performance metrics. Spirit.

By the way, between "-server" and "XXcompileThreshold" you want your first large set of requests to be either random or specially selected to implement as many functional paths as possible in Solr / Lucene; therefore, JIT is active and somewhat established.

+3
source share

All Articles