MongoDB case sensitivity index "begins with" performance issues "

Upon learning that 3.3.11 supports case insensitive case insensitive (using sorting) I rebuilt my database of 40 million entries to play with this. An alternative was to add, for example, string fields specific to case-insensitive searches and index them.

I asked MongoDB to support the sorting of my collection during creation as a quality here . Therefore, I did this to enable case insensitivity for the entire collection:

db.createCollection("users", {collation:{locale:"en",strength:1}}) 

After loading the collection, I tried direct queries, for example:

 db.users.find({full_name:"john doe"}) 

... and they return in ~ 10 ms with 50 results. It is case insensitive - so everything is fine. But then I will try something like:

 db.users.find({full_name:/^john/}) 

... or...

 db.users.find({full_name:/^john/i}) 

... and it takes more than 5 minutes. I was so disappointed. After executing explain() it turns out that the index was apparently used, but the query still takes too much time to complete. Could this be due to a bug or incomplete release of the development, or am I doing something fundamentally wrong?

As I do the search “begins with” the regular expression, the query should be lightning fast. Any ideas?

+6
source share
1 answer

Edit: there is a suitable solution. Basically, if the word you are looking for is "bob", you can search for $ lt: "boc", (where you increment the last character by one) and $ gte "bob". This will use the index. You can use the following function, which I made below (warning that it is not necessarily an error, but works quite a lot):

 var searchCriteria = {}; addStartsWithQuery(searchCriteria, "firstName", "bo"); People.find(searchCriteria).then(...); //searchCriteria will be /* { $and:[ {firstName:{$gte:"bo"}}, {firstName:{$lt:"bp"}} ] } */ //now library functions that will automatically generate the correct query and add it to `searchCriteria`. Of course for complicated queries you may have to modifiy it a bit. function getEndStr(str) { var endStrArr = str.toLocaleLowerCase('en-US').split(""); for (var i = endStrArr.length - 1; i >= 0; --i) { var lastChar = endStrArr[i]; if(lastChar === "z"){ return endStrArr.join("") + "zzzzzzzzzzzz"; } var nextChar = String.fromCharCode(lastChar.charCodeAt(0) + 1); if (nextChar === ":") nextChar = "a"; if (nextChar !== false) { endStrArr[i] = nextChar; return endStrArr.join(""); } endStrArr.pop(); } } function addStartsWithQuery(searchCriteria, propertyName, str) { if (!(typeof str === 'string') || !str.length) return; var endStr = getEndStr(str); if (endStr) { if (!searchCriteria.$and) searchCriteria.$and = []; searchCriteria.$and.push({ [propertyName]: { $gte: str } }); searchCriteria.$and.push({ [propertyName]: { $lt: endStr } }); } else { searchCriteria[propertyName] = { $gte: str } } } 

Well, it turns out MongoDB doesn't officially support it! I contacted a problem in JIRA where they make it clear. This makes sorting much less useful, unfortunately. Let them fix it soon! From a technical point of view, I noticed that although it uses an index, the index uses "[\"\", {})", as one of its index bounds, which always returns all the elements in the index, so scanning the index is useless. The next step in the query filters these results as usual.

https://jira.mongodb.org/browse/DOCS-9933

Vote for this problem to get them to fix it! https://jira.mongodb.org/browse/SERVER-29865

+1
source

All Articles