I am working with a dataset that has multiple columns that represent integer ID numbers (e.g. transactionId and accountId). These ID numbers are often 12 digits, which makes them too large to be stored as a 32-bit integer.
What is the best approach in this situation?
- Read the identifier as a character string.
- Read the identifier as integer64 using bit64.
- Read the identifier as numeric (i.e. double).
I was warned about the dangers of testing equality with doubling, but I'm not sure that this will be a problem in the context of using them as identifiers, where I can combine and filter based on them, but I never do arithmetic by identifier numbers.
Character strings seem intuitively similar, as it should be slower to check for equality and do merges, but perhaps in practice it doesn't really matter.
r
Rob donnelly
source share