I observe that the volumetric indexing performance using the .NET NEST client and ElasticSearch worsens over time with a constant number of indexes and number of documents.
We are launching ElasticSearch Version: 0.19.11, JVM: 23.5-b02 on an Amazon m1.large instance with Ubuntu Server 12.04.1 LTS 64 bit and Sun Java 7. Nothing works on this instance except what comes with the Ubuntu installation .
Amazon M1 Large Instance : from http://aws.amazon.com/ec2/instance-types/
7.5 GiB memory 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) 850 GB instance storage 64-bit platform I/O Performance: High EBS-Optimized Available: 500 Mbps API name: m1.large
ES_MAX_MEM is set to 4g and ES_MIN_MEM is set to 2g
Every night, we index / reindex ~ 15,000 documents using NEST in our .NET application. At any given time, there is only one index with <= 15,000 documents.
when the server was first installed, indexing and searching were fast for the first few days, and then indexing became slower and slower. indexing indexes indexes 100 documents at a time, and after a while it will take up to 15 seconds to complete a bulk operation. after that we began to observe with the exception of the following exception and interruption of indexing to a halt.
System.Net.WebException: The request was aborted: The request was canceled. at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization) :
The builk indexing implementation is as follows:
private ElasticClient GetElasticClient() { var setting = new ConnectionSettings(ConfigurationManager.AppSettings["elasticSearchHost"], 9200); setting.SetDefaultIndex("products"); var elastic = new ElasticClient(setting); return elastic; } private void DisableRefreshInterval() { var elasticClient = GetElasticClient(); var s = elasticClient.GetIndexSettings("products"); var settings = s != null && s.Settings != null ? s.Settings : new IndexSettings(); settings["refresh_interval"] = "-1"; var result = elasticClient.UpdateSettings(settings); if (!result.OK) _logger.Warn("unable to set refresh_interval to -1, {0}", result.ConnectionStatus == null || result.ConnectionStatus.Error == null ? "" : result.ConnectionStatus.Error.ExceptionMessage); } private void EnableRefreshInterval() { var elasticClient = GetElasticClient(); var s = elasticClient.GetIndexSettings("products"); var settings = s != null && s.Settings != null ? s.Settings : new IndexSettings(); settings["refresh_interval"] = "1s"; var result = elasticClient.UpdateSettings(settings); if (!result.OK) _logger.Warn("unable to set refresh_interval to 1s, {0}", result.ConnectionStatus == null || result.ConnectionStatus.Error == null ? "" : result.ConnectionStatus.Error.ExceptionMessage); } public void Index(IEnumerable<Product> products) { var enumerable = products as Product[] ?? products.ToArray(); var elasticClient = GetElasticClient(); try { DisableRefreshInterval(); _logger.Info("Indexing {0} products", enumerable.Count()); var status = elasticClient.IndexMany(enumerable as IEnumerable<Product>, "products"); if (status.Items != null) _logger.Info("Done, Indexing {0} products, duration: {1}", status.Items.Count(), status.Took); if (status.ConnectionStatus.Error != null) { _logger.Error(status.ConnectionStatus.Error.OriginalException); } } catch(Exception ex) { _logger.Error(ex); } finally { EnableRefreshInterval(); } }
Restarting the elasticsearch daemon does not seem to make any difference, but deleting the index and re-indexing does everything. But in a few days we will have the same problem of slow indexing.
I simply deleted the index and added Optimization after re-enabling the update interval after each operation with the bulk index in the hope that this could lead to a deterioration of the index.
... ... finally { EnableRefreshInterval(); elasticClient.Optimize("products"); }
Am I doing something terribly wrong here?