Sort by Sort

Availability of collection:

{"name": "a"}, {"name": "B"}, {"name": "b"}, {"name": "c"}, {"name": "á"}, {"name": "A"} 

ex. How to sort in Spanish case insensitive?

I tried this:

 var abc = [{"name": "a"}, {"name": "B"}, {"name": "b"}, {"name": "c"}, {"name": "á"}, {"name": "A"}]; for (i in abc) db.abc.save(abc[i]); db.abc.find({},{"_id":0}).sort({"name":1}); 

Exit:

 [ { "name" : "A" }, { "name" : "B" }, { "name" : "a" }, { "name" : "b" }, { "name" : "c" }, { "name" : "á" }, ] 

Desired Result:

 [ { "name" : "a" }, { "name" : "á" }, { "name" : "A" }, { "name" : "b" }, { "name" : "B" }, { "name" : "c" } ] 
+7
source share
5 answers

I know this is an old thread, but I think it would be helpful to answer anyway.

You definitely do not want to sort in your application, because this means that you need to collect all the documents in the collection into memory in order to sort them and return the desired window. If your collection is huge, then it is extremely inefficient. The database should sort and return a window to you.

But, MongoDB does not support locale-sensitive sorting, you say. How do you solve the problem? Magic is the concept of "sort keys."

Basically, let's say you had a regular English / Latin alphabet from "a" to "z". What would you do is create a collation of the sort keys from "a" to "01" and from "b" to "02", etc. To "z" to "26". That is, match each letter with a number in the sort order for that language, and then encode that number as a string. Then map the row you want to sort for this type of sort key. For example, "abc" will become "010203". Then add the property to your document using the sort key for the property and add the property name with the locale name:

 { name: "abc", name_en: "010203" } 

Now you can sort in the language "en" only by indexing on the property "name_en" and use the usual MongoDB sorting in English for selectors and ranges instead of the property "name".

Now let's say you have another crazy language, “xx,” where the alphabet order is “acb” instead of “abc”. (Yes, there are languages ​​that are messy with the order of the Latin alphabet this way!) The sort key will be like this:

 { name: "abc", name_en: "010203", name_xx: "010302" } 

Now you need to create indexes on name_en and name_xx and use normal MongoDB sorting to correctly sort these locales. Basically, additional properties are proxies for sorting in different locales.

So where do you get these comparisons, you ask? After all, you are not a globalization specialist, are you?

Well, if you use Java, C or C ++, there are ready-made classes that do this mapping for you. In Java, use the standard Collator class or use the icu4j Collator class. If you use C / C ++, use the C / C ++ version of the Collator ICU functions / class. For other languages, you're out of luck if you can't find a library that already does this.

Here are some links to help you find them:

Java Collator Standard Library: http://docs.oracle.com/javase/7/docs/api/java/text/Collator.html#getCollationKey(java.lang.String)

C ++ Collator Class: http://icu-project.org/apiref/icu4c/classicu_1_1Collator.html#ae0bc68d37c4a88d1cb731adaa5a85e95

You can also create different sort keys that allow you to sort case-insensitively for each language (yes, displaying cases is language-sensitive!) And without an accent, and the Unicode option is insensitive or any combination of the above. The only problem is that now you have many properties that are parallel to each sort property, and you should synchronize them all when you update the base property "name". This is a pain in your know-how, but nevertheless, it is better than sorting at the level of your application or business logic.

Also be careful with range cursors. For example, in English, we simply ignore the emphasis on the characters. So, "& Ouml;" sorted the same as “O” and it will be displayed in the range from “M” to “Z”. But, in Swedish, accented characters are sorted after "Z". So, if you use the range "M" - "Z", you will include a bunch of records starting with "& Ouml;" it should be in English, but not in Swedish.

This also has an outline effect if you separate the text property of the document. Be careful what ranges fall into this shard. It would be better to outline things that are not locale sensitive, such as hashes.

+10
source

Although the other answers here are true for versions of Mong.de 3.2.x and previous versions starting with 3.4.0, you can "specify sortings for a collection or view, index, or specific operations that support sorting."

Full documentation for this feature is provided here .

+10
source

MongoDB is currently not sorting.

Implementation The Unicode collation standard is the best way to solve this problem.

But this will slow down sorting and increase indexes. So now it’s best to sort in your application.

+3
source

An easy workaround is to create a new field with text converted to ascii simple characters.

 { "name": "Ánfora", "name_sort": "anfora" } { "name": "Óscar", "name_sort": "oscar" } { "name": "Barça", "name_sort": "barc~a" } { "name": "Niño", "name_sort": "nin~o" } { "name": "¡Hola!", "name_sort": "hola!" } { "name": "¿qué?", "name_sort": "que?" } 

Then just sort by 'name_sort'

+2
source

Unfortunately, you still cannot make case insensitive, now sorting is returned in "index" order. There is a ticket open:

https://jira.mongodb.org/browse/SERVER-90

You might want to skip sorting in mongo and do this in your application.

+1
source

All Articles