Full text search inside an embedded document

my document is modal here

"translation" : { "en" : { "name" : "brown fox", "description" : "the quick brown fox jumps over a lazy dog" }, "it" : { "name" : "brown fox ", "description" : " the quick brown fox jumps over a lazy dog" }, "fr" : { "name" : "renard brun ", "description" : " le renard brun rapide saute par-dessus un chien paresseux" }, "de" : { "name" : "brown fox ", "description" : " the quick brown fox jumps over a lazy dog" }, "es" : { "name" : "brown fox ", "description" : " el rápido zorro marrón salta sobre un perro perezoso" } }, 

Now I need to add a text index for the above document. how can i achieve I already added a text index when translating, but this does not work, since the name and description are in the language prefix (inside the object). I also have to give the text weight (text rating) for the name and description separately. ie the title has a textual rating of 5 and the description has 2 points. so I can’t give the wild card text index ie

 {'$**': 'text'} 

I also tried with 'translation.en.name': 'text' , but this does not work, and also my languages ​​are dynamic, which are increasing, so the best solution for this case

Any help would be greatly appreciated.

+8
mongodb full-text-search meteor
source share
2 answers

Because inline fields are dynamic, the best approach is to change your schema so that the translation field becomes an array of inline documents. Below is an example of such a scheme, which displays the current structure:

 "translation": [ { "lang": "en", "name" : "brown fox", "description" : "the quick brown fox jumps over a lazy dog" }, { "lang": "it", "name" : "brown fox ", "description" : " the quick brown fox jumps over a lazy dog" }, { "lang": "fr", "name" : "renard brun ", "description" : " le renard brun rapide saute par-dessus un chien paresseux" }, { "lang": "de", "name" : "brown fox ", "description" : " the quick brown fox jumps over a lazy dog" }, { "lang": "es", "name" : "brown fox ", "description" : " el rápido zorro marrón salta sobre un perro perezoso" } ] 

Using this scheme, it is easy to apply a text index in the name and description fields:

 db.collection.createIndex( { "translation.name": "text", "translation.description": "text" } ) 

Regarding the modification of the scheme, you will need to use the api, which allows you to update your collection in bulk and the Bulk API does this for you. They provide better performance, because you will send operations to the server in batches, say 1000, which gives you better performance, since you do not send every request to the server, but once every 1000 requests.

This approach is shown below, the first example uses the Bulk API, available in versions MongoDB> = 2.6 and <3.2. It updates all documents in the collection, changing all translation fields to arrays:

 var bulk = db.collection.initializeUnorderedBulkOp(), counter = 0; db.collection.find({ "translation": { "$exists": true, "$not": { "$type": 4 } } }).snapshot().forEach(function (doc) { var localization = Object.keys(doc.translation) .map(function (key){ var obj = doc["translation"][key]; obj["lang"] = key; return obj; }); bulk.find({ "_id": doc._id }).updateOne({ "$set": { "translation": localization } }); counter++; if (counter % 1000 === 0) { bulk.execute(); // Execute per 1000 operations // re-initialize every 1000 update statements bulk = db.collection.initializeUnorderedBulkOp(); } }) // Clean up remaining operations in queue if (counter % 1000 !== 0) { bulk.execute(); } 

The following example applies to the new version of MongoDB 3.2, which has since been deprecated in the Bulk API and provided a new apis suite using bulkWrite() .

It uses the same cursors as above, but creates arrays with volumetric operations, using the same forEach() cursor method to pop each massive record document into an array. Since write commands can accept no more than 1000 operations, you need to group your operations in order to have no more than 1000 operations and reinitialize the array when the loop falls into 1000 iterations:

 var cursor = db.collection.find({ "translation": { "$exists": true, "$not": { "$type": 4 } } }).snapshot(), bulkUpdateOps = []; cursor.forEach(function(doc){ var localization = Object.keys(doc.translation) .map(function (key){ var obj = doc["translation"][key]; obj["lang"] = key; return obj; }); bulkUpdateOps.push({ "updateOne": { "filter": { "_id": doc._id }, "update": { "$set": { "translation": localization } } } }); if (bulkUpdateOps.length === 1000) { db.collection.bulkWrite(bulkUpdateOps); bulkUpdateOps = []; } }); if (bulkUpdateOps.length > 0) { db.collection.bulkWrite(bulkUpdateOps); } 
+4
source share

To create an index in a name field, use it as db.collectionname.createIndex({"name": 'text'})

To ensure that the index is created, a list of all indexes created with this command

db.collectionname.getIndexes()


EDIT

The problem is not in the method of creating indexes, the problem is how to achieve using the above model for all languages

I got it now, you cannot index the way you want for all languages ​​with an existing document outline, please change the layout, below is one of the ways you can achieve this.

  { "_id" : 1, "translation" : [ { "language": "en", "name" : "brown fox", "description" : "the quick brown fox jumps over a lazy dog" }, { "language" : "it", "name" : "brown fox ", "description" : " the quick brown fox jumps over a lazy dog" }, { "language" :"fr", "name" : "renard brun ", "description" : " le renard brun rapide saute par-dessus un chien paresseux" }, { "language" : "de", "name" : "brown fox ", "description" : " the quick brown fox jumps over a lazy dog" }, { "language":"es", "name" : "brown fox ", "description" : " el rápido zorro marrón salta sobre un perro perezoso" } ]} 

Then create the index as db.collectionname.createIndex({"language" : "text"});

The above assumption is based on your proposed model, since the name and description are keys in the translation, not top-level objects. is not it?

No, with the schema I provided, it is easier to have text indexes in the name and description fields, and you can search based on languages.

+1
source share

All Articles