Should I prefer integers or strings in my Solr schema if the field matches one?

Question

Should I prefer integers or strings in my Solr schema if the field matches one?

Say that I have a field in my Solr schema that has a value of 1, 2, 3, or 4. I do not do arithmetic in this field. The field is the status of the entry. It can also be easily A, B, C or D. Each of the 11 million entries has one of these statuses.

In this question, the answer says that ints are "more memory efficient", so start. Are there other factors? Does one fit faster than the other?

This field will not be sorted. The values are arbitrary and we will never pretend. It will be used only in filters.

+4

lucene solr

Andy lester Jan 14 '13 at 17:14

source share

2 answers

With only 4 different values, int and String will work very precisely for filter queries, sorting, and even range queries.

+3

jpountz Jan 15 '13 at 14:55

source share

Mikehoss · Accepted Answer · 2013-01-14T17:51:26+0000

Will you request a range? So, if your 1 ... 4 really marks the statuses, say Bad to Great, would you request entries from 1-2? This is the only place you might need them to be ints (and since you only have 4, this is not so important).

My rule in the data warehouse is that if int will never be used as int, save it as a string. This may require more space, etc., but you can do more string manipulation, etc. And the memory requirements of 11-meter records may not matter if this single field is a string or int (11 m is a lot of records, but not a big load for Solr / Lucene).

Should I prefer integers or strings in my Solr schema if the field matches one?

More articles: