Which approach is best for storing a list of words in mysql, which will later be used for statistics?

DETAILS

I have a poll (let's call it quiz1). Quiz1 uses the same list of words every time it is generated. If the user needs, they can skip words to complete the quiz. Id like to store these missing words in mysql and then run statistics on them.

At first, I was going to store the missing words in one column as a row. Each word will be separated by a comma.

|testid | missedwords | score | userid | ************************************************************************* | quiz1 | wordlist,missed,skipped,words | 59 | 1 | | quiz2 | different,quiz,list | 65 | 1 | 

The problem with this approach is that I want to show statistics at the end of each quiz about what words most often were missed by users who took quiz1.
I assume that saving the missing words in one column, as indicated above, is ineffective for this purpose, since I will need to extract the information and then count it - (perhaps with php if I did not save this data in a separate table).

Then I thought that I needed to create a separate table for the missing words. The advantage of the table below is that it should be easy to count the words from the table below.

 |Instance| missed word | ***************************** | 1 | wordlist | | 1 | missed | | 1 | skipped | 

Another approach I could create a table with the counts and update it every time the quiz is done.

 Testid | wordlist| missed| skipped| otherword| ************************************************** Quiz1 | 1 | 1| 1| 0 | 

The problem with this approach is that for each quiz I will need a different table, because each survey will use different words. Also, information is lost, because only the tally is not stored in the corresponding data that the user missed, what words.

Question

Which approach would you use? What for? Alternative approaches to this task are welcome. If you see flaws in my logic, feel free to list them.

EDIT Users will be able to repeat the quiz as many times as they want. Their information will not be updated; instead, a new instance will be created for each quiz they returned.

+7
source share
5 answers

The best way to do this is to completely replace the norm. Thus, the analysis will be simple and quick.

 quiz_words with wordID, word quiz_skipped_words with quizID, userID, wordID 

To get all missed words of a user:

 SELECT wordID, word FROM quiz_words JOIN quiz_skipped_words USING (wordID) WHERE userID = ?; 

You can add a group by clause to have group counts of the same word.

To get a specific word counter:

 SELECT COUNT(*) FROM quiz_words WHERE word LIKE '?'; 
+3
source

According to the theory of database normalization , the second approach is better, because ideally one cell of a relational table should store only one value, which is atomic and unsuitable. Each word is an instance of an entity.

In addition, I would suggest that you do not create Quiz-Word tables, but reserve another column in the Missed-Word table for the test for which the word was specified, and then use this column as a foreign key for the Quiz table. Then you can probably avoid generating the table in real time (which is a β€œbad practice” in database design).

+1
source

why not have a quiz table and quiz_words table, the quiz_words table will store id, quizID, word as columns. Then, for each quiz instance, create entries in the quiz_words table for each word that the user has used.

Then you can start mysql counting in quiz_words table based on quizID type and quiz type

+1
source

The best solution (from my pov) for what you are trying to achieve is a normalized aproach:

  • test table with test_id column and other columns
  • missed_words , which has id (AI PK) and word (UQ), here, you can also have a hits column, which should be increased every time an association with this word is made in the test_missed_words table , so you have statistics that you want to compile and you don't need them computed from the select query
  • test_missed_words , which is a link table that has test_id and missed_word_id (composite PK)

This way you don't have redundant data (missing words) and you can easily extract the statistics you want

+1
source

By storing as much information as possible (as well as the ability to compile user statistics later, as well as general statistics), I would create a table structure similar to:

  Stats quizId | userId | type| wordId| ****************************************** 1 | 1 | missed| 4| 1 | 1 | skipped| 7| 

Where type can be either an int that defines different types of actions, or a string representation - depending on whether you can ever be bigger. ^^

Then:

  Quizzes quizId | quizName| ******************** 1| Quiz 1| 

With a list of words created for each quiz:

  WordList (pk: wordId) quizId | wordId| word| *************************** 1 | 1 | Cat| 1 | 2 | Dog| 

You have a user table, but you want, we just bind the id with it on this system.

Moreover, all id fields will not be unique keys in the stats table. When a user skips or skips a word, you add the id that word to the stats table along with the corresponding quizId and type . Getting statistics in this way would make it easier like a per-user foundation, a per-word foundation, or a per-type foundation β€” or a combination of the three. It will also make a word list for each quiz easily accessible for quiz. ^^

Hope this helps!

+1
source

All Articles