Programmatically determine the relative "popularity" of a list of items (books, songs, films, etc.),

Question

Programmatically determine the relative "popularity" of a list of items (books, songs, films, etc.),

Given a list of (say) songs, what is the best way to determine their relative "popularity"?

My first thought is to use Google Trends. This song list:

Underground Homesick Blues
Empire State of Mind
California Gurls

produces the following report on Google Trends : (to find out which one is popular now , I limited the report to the last 30 days)

http://s3.amazonaws.com/instagal/original/image001.png?1275516612

Empire State of Mind is slightly more popular than the California Gurl, and Subterranean Homesick Blues is much less popular than any.

So this works very well, but what happens when your list consists of 100 or 1000 songs? Google Trends allows you to compare 5 terms at the same time, so there is no huge circular mode, what's the right approach?

Another option is to simply do a Google search for each song and see which results have the best results, but that does not mean that it is really the same.

+7

algorithm statistics

Tom lehman Jun 2 '10 at 22:13

source share

4 answers

Grembo · Answer 1 · 2010-06-03T16:05:50+0000

Great question - one Britney Spears song can be phenomenally popular for 2 months and then (fortunately) forgotten, while another Elvis song may have been popular for 30 years. How do you quantify these two? We know that we want to think that sustainable popularity is more important than a flash in the pan, but how do you get this result?

First, I would normalize by release date - Subterranean Homesick Blues can now be unpopular (but not in my house), but normalization in 1965 could lead to a different result.

Since most songs rise in popularity, level out, then decrease, let them choose an area when they level out. It can be assumed that during this period two series are stationary, uncorrelated, and normally distributed. Now you can simply apply the test to determine if the means are different.

There are probably less restrictive tests to determine the magnitude of the difference between the two time series, but I have not yet come across them.

Is anyone

Gelatin · Answer 2 · 2010-06-02T22:35:04+0000

You can find the item on Twitter and see how many times it is mentioned. Or look at Amazon to see how many people viewed it and what rating they gave. Both Twitter and Amazon have APIs.

vad · Answer 3 · 2010-06-02T22:42:11+0000

There is an unreasonable google trend. See http://zoastertech.com/projects/googletrends/index.php?page=Getting+Started I have not used it, but maybe this will help.

Matthieu M. · Answer 4 · 2010-06-03T14:00:50+0000

Of course, I would consider the Google API as "limited."

In general, the comparison functions used for sorting algorithms are very “binary”:

input: 2 elements
output: true / false

Here you have:

input: 5 elements
conclusion: relative weights of each element

Therefore, you only need a linear number of API calls (whereas sorting usually requires O (N log N) calls for comparison functions).

You will need the ceil( (N-1)/4 ) tags. This can be parallelized, although carefully read the user manual, as well as the number of requests that you are allowed to send.

Then, as soon as they are all "rated", you can have a simple view in local.

Intuitively, to assemble them correctly, you would:

Shuffle your list
Complete the first 5 elements
API call
Paste them into the result (use the insert here)
Choose a median
Complete the first 4 elements (or fewer if fewer)
Call the API with the environment and those first 4
Return to Paste until you finish the items.

If your list is 1000 songs long, it's 250 API calls, no big deal.

Programmatically determine the relative "popularity" of a list of items (books, songs, films, etc.),

More articles: