I have a movie database where I need to fill in the data to make it easier to test and develop the application. There are tables for storing movie ratings and user accounts, users rate movies.
I started developing a script to populate the database with fake and universal data, but I don’t know how to randomize the rating. For each movie, I choose a random number of users, 100, 500, 1000, whatever. And for each of these users, I rank the rating from 1 to 10. But these ratings lead to the same average value of about 5. This means that the distribution of ratings (from 1 to 10) for a particular movie is basically the same. This is not “realistic”, because all films with such ratings will have the same average value, so the same ratings from different users and different numbers of users do not really matter.
I wanted movie A to have an average of 7, movie average of B 5, filter C an average of 8, etc. But I just don’t want the average to be different for each movie. I mean, it would be nice to create such ratings (for a certain number of users):
http://www.imdb.com/title/tt1046173/ratings or http://www.imdb.com/title/tt0486640/ratings
You know, something random that could produce two different options like the ones above. I click refresh and I get the first graph, I click refresh and get the second, hit again and get something different or similar, something “random” and “realistic”.
I am also going to display graphs like this in my application, so it would be nice to have different distributions. But I don’t know how I can randomly execute this with a simple script to generate all this.
How can i solve this? Maybe too much work is not worth it?
Perhaps something simpler, for example, select a point (from 1 to 10), and then create a normal rating distribution, where this selected point is the highest that will work for me.