Should I prefer integers or strings in my Solr schema if the field matches one?

Say that I have a field in my Solr schema that has a value of 1, 2, 3, or 4. I do not do arithmetic in this field. The field is the status of the entry. It can also be easily A, B, C or D. Each of the 11 million entries has one of these statuses.

In this question, the answer says that ints are "more memory efficient", so start. Are there other factors? Does one fit faster than the other?

This field will not be sorted. The values ​​are arbitrary and we will never pretend. It will be used only in filters.

+4
source share
2 answers

Will you request a range? So, if your 1 ... 4 really marks the statuses, say Bad to Great, would you request entries from 1-2? This is the only place you might need them to be ints (and since you only have 4, this is not so important).

My rule in the data warehouse is that if int will never be used as int, save it as a string. This may require more space, etc., but you can do more string manipulation, etc. And the memory requirements of 11-meter records may not matter if this single field is a string or int (11 m is a lot of records, but not a big load for Solr / Lucene).

+6
source

With only 4 different values, int and String will work very precisely for filter queries, sorting, and even range queries.

+3
source

All Articles