For Mongodb, is it better to access the object or use the String natural key?

I create a corpus of indexed sentences in different languages. I have a set of languages โ€‹โ€‹that have both ObjectId and ISO code as a key. Is it better to use a link to a collection of languages โ€‹โ€‹or store a key like "en" or "fr"?

I assume this is a trade-off between:

  • ease of reference to the language
  • object in this collection
  • query execution speed, where the sentence has a specific language
  • disk data size

Any best practices I should know about?

+7
source share
2 answers

In the end, it really comes down to personal choices and what works best for your application.

The only requirement that MongoDB imposes on _id is that it is unique. It can be an ObjectId (which is provided by default), a string, even an embedded document (as I recall, it cannot be an Array).

In this case, you can probably guarantee that the ISO code is a unique value and can be an ideal value. You have a โ€œknownโ€ primary key, which is also useful in itself, being identifiable, so using this instead of the generated identifier is probably a more reasonable bet. It also means that you refer to this information in another collection, you can save the ISO code instead of the object identifier; Those who view your raw data can immediately determine what information this point refers to.

Aside:

The two great advantages of ObjectId are that they can be created uniquely on multiple machines, processes, and threads, without requiring any kind of central sequence tracking on the MongoDB server. They are also stored as a special type in MongoDB, which uses only 12 bytes (as opposed to the 24-byte representation of the string version of ObjectID)

+5
source

If disk space is not a problem, I would probably go with a language key, such as "en" or "fr". Thus, it saves the execution of an additional query in the Languages โ€‹โ€‹collection, to find the ObjectId key for a given language, you can simply query sentences directly:

 db.sentences.find( { lang: "en" } ) 

While the lang field is being indexed - db.sentences.ensureIndex( { lang: 1 } ) - I don't think there will be a big difference in query performance.

If you have a large dataset and disk space is a concern, you might consider ObjectId (12 bytes) or a number (8 bytes), which may be less than the UTF-8 string key depending on its length.

+3
source

All Articles