80 3 rows of columns or 1 81 rows of columns

I have some data to store,

I have to store this data for each user,

Data more or less

key: 0 value: 107, key: 1 value 213 

There are about 80 sets of keys / values ​​per user

So my question is:

I have one line, which is basically

 user_id key1, key2... key80 

or i have 80 lines

 user_id, key, value 


The data I have to save:
You can see the actual data that I have to store here: http://219.89.41.169/ReachApi/?gt=The%20hailwood

I cannot just copy / paste as it is about 8 MB of data, so it may take some time to download.

But, as you can see, I have been given data in this format, so I have to store it that way.

+4
source share
5 answers

I usually compress at the thought of 81 columns in the table. However, IF:

  • The number of keys / fields is unlikely to change, AND
  • All fields relate directly to the user and apply to each user and / or
  • You will probably need to request more than one key at a time for a given user,

then it makes sense to have so many columns than store each user / key command as its own row. You get type safety, the ability to hold back and index your data is better, and you can request user statistics without requiring a dozen joins (or returning a bunch of rows that you must then collect in a data structure).

If the number of fields is constantly in the stream or the fields do not apply to all (that is: some users have different numbers of fields), or you never want to know more than the user value 3225 for field 53, then the user / key / value table has more meaning. But it would hurt in the ass that everything was in order, updates would be required forever (because the indexes will need to be redone, and you will need indexes), and the queries will be disgusting if you need more than one or two fields at a time.

+2
source

You should use your first approach - one line

 user_id | key | value 

This way you are more flexible when you need more / less keys for each user.

+2
source

I looked at the data. Why are you thinking about storing data in a database? I suppose you want to read the data, run some algorithm on it and calculate the result, right? Or is it really a requirement to store data on an ongoing basis, or is that even all you are trying to do?

You have a list of all that look like this:

 [KillsByEnemyTypeClass] => Array ( [0] => stdClass Object ( [Key] => 0 [Value] => 0 ) ... [MedalCountsByType] => Array ( [0] => stdClass Object ( [Key] => 0 [Value] => 0 ) 

As follows from the data format, lists are sequential arrays, not pairs of key values. Since the keys are all sequential, you can save the values ​​in one large array in your programming language.

This is your data format:

 struct Data { std::string reason; int status; struct AiStatistics { int aDeathsByDamageType[82]; int nHopperId; int aKillsByDamageType[82]; int nMapId; double fMedalChestCompletionPercentage; int aMedalCountsByType[128]; int nTotalMedals; int nVariantClass; ... } aaistatistics[9]; } 

It seems that arrays should have a static size, because 82 different types of damage probably mean something to you. If there are 83 types of damage, the program that generates this data will change, and you will also have to adapt your algorithms. This means that the data and your program are not independent, and the benefits of using a database are doubtful.

Update

Thanks for clarifying your requirements. I understand why you should now cache data for other clients.

But: is the data related to the data you have to store? Does this mean that you only cache the output from the web API, and if as a result of the change you overwrite the previously saved data? Or is there a temporary dimension and you want to keep the sequence of API outputs? In both cases, a thin C ++ API around binary data can be much more efficient than a database.

If it's just for caching data, I would still suggest modeling the database after the above object model. The AI ​​statistics table has one column per member variable, i.e. One column for the full array of DeathsByDamageType. Store the entire array as a single value. Clients cannot search for a single value using this model, but must receive a full array. If you do not have a specific use case for anything else, I would stick to this. Your database will be much simpler.

If this is really, really, really not enough for your purposes, your tables are likely to be:

 table Data { id, reason, status } table AiStatistics { id, data_id, ..., all integer members like nTotalMedals etc } table DeathByDamageType { aistat_type, index, value } table ... for every other array member of AiStatistics 

By default, storing DeathByDamageType in this way is really inefficient, the table is at least three times larger than the array value, because for each value you need to store the AiStatistics reference identifier and array index separately.

If you do this like that, at least use sparseness in arrays and don't store values ​​in DeathByDamageType that are 0. You could not do this with an array.

+2
source

I would stay away from 81 columns in the database table, especially for things where you are likely to add more “keys” in the future (in which case you will need to add more columns to the table).

Key-Value Tables

can lead to performance problems and make it difficult to record queries, so it would be useful to see if you can somehow group the keys into related parts that you could stick on smaller, more specific tables.

+1
source

Pro 80 colors

  • one insert to create a user and only updates for modifications
  • comparing the values ​​of two or more keys can be performed using simple SELECT statements (without using self-joins)
  • O / R mapping only works out of the box
  • the database can handle type checking

About 80 lines

  • Adding keys is just an insert, ALTER TABLE is not required (which implies a long idle time for updates if you have a huge number of users
  • plugins can easily add their own configuration settings (assuming some kind of prefix in the key names avoids conflicts)

Comments about 80 lines

  • If you use this approach, I suggest using "key_id" instead of "key" and have a second table for keys, so key names can be verified. This table may contain a third column for documentation.
+1
source

All Articles