Insert / Update ElasticSearch Nest

Question

Insert / Update ElasticSearch Nest

I created an index in a resilient state using the following query:

PUT public_site { "mappings": { "page": { "properties": { "url": { "type": "string" }, "title":{ "type": "string" }, "body":{ "type": "string" }, "meta_description":{ "type": "string" }, "keywords":{ "type": "string" }, "category":{ "type": "string" }, "last_updated_date":{ "type": "date" }, "source_id":{ "type":"string" } } } } }

I would like to insert a document into this index using the .net NEST library. My problem is that the signature of the .net update method makes no sense to me.

 client.Update<TDocument>(IUpdateRequest<TDocument,TPartialDocument>)

The Java library makes much more sense to me:

 UpdateRequest updateRequest = new UpdateRequest(); updateRequest.index("index"); updateRequest.type("type"); updateRequest.id("1"); updateRequest.doc(jsonBuilder() .startObject() .field("gender", "male") .endObject()); client.update(updateRequest).get();

In NEST, where do the TDocument and TPartialDocument classes come from? Are these C # classes to represent my index?

+5

elasticsearch nest

Andrew Walters Aug 18 '16 at 18:48

source share

1 answer

Russ cam · Accepted Answer · 2016-08-19T01:45:25+0000

TDocument and TPartialDocument are typical POCO type parameters that are

present the document in Elasticsearch ( TDocument ) and
Representation of a part of a document in Elasticsearch ( TPartialDocument ) during partial updating.

In the case of a full update, TDocument and TPartialDocument may be of the same specific POCO type. Let's look at some examples to demonstrate.

Create an index with the above mapping. First, we can present a document using the POCO type

 public class Page { public string Url { get; set; } public string Title { get; set; } public string Body { get; set; } [String(Name="meta_description")] public string MetaDescription { get; set; } public IList<string> Keywords { get; set; } public string Category { get; set; } [Date(Name="last_updated_date")] public DateTimeOffset LastUpdatedDate { get; set; } [String(Name="source_id")] public string SourceId { get; set; } }

By default, when NEST serializes POCO properties, it uses the camel stone naming convention. Because your index has a snake body for some properties, for example. "last_updated_date" , we can override the name that NEST serializes to use attributes.

Then, create a client to work with

 var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200")); var pagesIndex = "pages"; var connectionSettings = new ConnectionSettings(pool) .DefaultIndex(pagesIndex) .PrettyJson() .DisableDirectStreaming() .OnRequestCompleted(response => { // log out the request if (response.RequestBodyInBytes != null) { Console.WriteLine( $"{response.HttpMethod} {response.Uri} \n" + $"{Encoding.UTF8.GetString(response.RequestBodyInBytes)}"); } else { Console.WriteLine($"{response.HttpMethod} {response.Uri}"); } Console.WriteLine(); // log out the response if (response.ResponseBodyInBytes != null) { Console.WriteLine($"Status: {response.HttpStatusCode}\n" + $"{Encoding.UTF8.GetString(response.ResponseBodyInBytes)}\n" + $"{new string('-', 30)}\n"); } else { Console.WriteLine($"Status: {response.HttpStatusCode}\n" + $"{new string('-', 30)}\n"); } }); var client = new ElasticClient(connectionSettings);

Connection settings have been configured in such a way that it is useful during development;

DefaultIndex() - The default index is configured as "pages" . If the explicit index name is not passed on request and the index name cannot be inferred for POCO, then the default index will be used.
PrettyJson() - Refine (e.g. indentation) json requests and responses. It will be useful to find out what is sent and received from Elasticsearch.
DisableDirectStreaming() - NEST by default serializes POCO into the request stream and deserializes the response types from the response stream. Disabling this direct streaming will buffer the request and response bytes in the memory streams, which will allow us to get them out of the system in OnRequestCompleted()
OnRequestCompleted() - Called after receiving a response. This allows us to output requests and responses during development.

2, 3 and 4 are useful during development, but they will have some overhead, so you can not use them in the production process.

Now create a page display index

 // delete the index if it exists. Useful for demo purposes so that // we can re-run this example. if (client.IndexExists(pagesIndex).Exists) client.DeleteIndex(pagesIndex); // create the index, adding the mapping for the Page type to the index // at the same time. Automap() will infer the mapping from the POCO var createIndexResponse = client.CreateIndex(pagesIndex, c => c .Mappings(m => m .Map<Page>(p => p .AutoMap() ) ) );

Take a look at the automation documentation to learn more about how you can control the display for POCO types.

Indexing a new page type is as simple as

 // create a sample Page var page = new Page { Title = "Sample Page", Body = "Sample Body", Category = "sample", Keywords = new List<string> { "sample", "example", "demo" }, LastUpdatedDate = DateTime.UtcNow, MetaDescription = "Sample meta description", SourceId = "1", Url = "/pages/sample-page" }; // index the sample Page into Elasticsearch. // NEST will infer the document type (_type) from the POCO type, // by default it will camel case the POCO type name var indexResponse = client.Index(page);

Indexing a document will create the document if it does not exist, or overwrite the existing document if it exists. Elasticsearch has an upbeat concurrency control that you can use to control how it behaves under different conditions.

We can update the document using the Update methods, but first a bit of background.

We can get the document from Elasticsearch by specifying the index, type and identifier. NEST makes this a little easier, because we can get everything out of POCO. When we created our mapping, we did not specify the Id property in POCO; if NEST sees a property with the name Id , it uses this as the document identifier, but since we don’t have it, this is not a problem, because Elasticsearch generates the document identifier and places it in the document metadata. Since the document metadata is separate from the original document, this can be done by documents modeling as POCO types is a bit more complicated (but not impossible); for this answer, we will have access to the document identifier through metadata and access to the source through the _source field. We can combine the identifier with our source in the application.

An easier way to handle this is to have an identifier in POCO. We can specify the Id property in POCO, and this will be used as the document identifier, but we do not need to call the Id property if we do not want it, and if we do not, we need to tell NEST which property the id represents. This can be done using the attribute. Assuming SourceId is a unique identifier for the Page instance, use the ElasticsearchTypeAttribute IdProperty property to indicate this. Perhaps we should not parse this string as well, but index it verbatim, we can also control this using the Index property of the property attribute

 [ElasticsearchType(IdProperty = nameof(SourceId))] public class Page { public string Url { get; set; } public string Title { get; set; } public string Body { get; set; } [String(Name="meta_description")] public string MetaDescription { get; set; } public IList<string> Keywords { get; set; } public string Category { get; set; } [Date(Name="last_updated_date")] public DateTimeOffset LastUpdatedDate { get; set; } [String(Name="source_id", Index=FieldIndexOption.NotAnalyzed)] public string SourceId { get; set; } }

With their help, we will need to recreate the index as before, so that these changes are reflected in the mapping, and NEST can use this configuration when indexing the Page instance.

Now back to the updates :) We can get the document from Elasticsearch, update it in the application and then re-index it

 var getResponse = client.Get<Page>("1"); var page = getResponse.Source; // update the last updated date page.LastUpdatedDate = DateTime.UtcNow; var updateResponse = client.Update<Page>(page, u => u.Doc(page));

The first argument is the identifier of the document we want to receive, which can be inferred by NEST from the Page instance. Since we are passing the entire document here, we could use .Index() instead of Update() , since we are updating all fields

 var indexResponse = client.Index(page);

However, since we only want to update LastUpdatedDate , we need to extract the document from Elasticsearch, update it in the application, and then send the document back to Elasticsearch - this is a lot of work. We can just send only the updated LastUpdatedDate to Elasticsearch instead of using a partial document. C # anonymous types are really useful here.

 // model our partial document with an anonymous type. // Note that we need to use the snake casing name // (NEST will still camel case the property names but this // doesn't help us here) var lastUpdatedDate = new { last_updated_date = DateTime.UtcNow }; // do the partial update. // Page is TDocument, object is TPartialDocument var partialUpdateResponse = client.Update<Page, object>("1", u => u .Doc(lastUpdatedDate) );

We can use optimistic concurrency control here if we need to use RetryOnConflict(int)

 var partialUpdateResponse = client.Update<Page, object>("1", u => u .Doc(lastUpdatedDate) .RetryOnConflict(1) );

With a partial update, Elasticsearch will receive the document, apply the partial update, and then index the updated document; if the document changes between receiving and updating, Elasticsearch is going to repeat it again based on RetryOnConflict(1) .

Hope that helps :)

Insert / Update ElasticSearch Nest

More articles: