This is a machine learning issue. You are trying to learn a model from controlled data. To do this, you can run a simple algorithm similar to Perceptron or SampleRank ( pdf ):
First you define the functions that apply to the words in the tagline. Opportunities can be divided between words, for example. features of the word "world" can be:
- "peace",
- "noun",
- "abstract noun",
- short noun
- "starts with p",
- "ends with 's'-sound",
- ...
The first feature of the โworldโ is a unique function that works only on the โworldโ, while other functions can also be triggered by other words.
Each function has a weight (higher is better). So you have a vector function and a vector weight. This will allow you to assign a weight (rating) to any slogan (just the sum of all the weighted objects that work on the words in the slogan). All weights are initialized to 0.0.
Now you start training:
You sort through all the pairs of slogans. For each pair, you know the true rating (by the votes you have). Then you calculate the rating according to the functions and their current weights. If the true rating and the rating according to your current weights (i.e., according to the current model) are the same, you simply move on to the next pair. If your model assigned the wrong rating, you correct the function scales: you add 1.0 to the functions weights that work on the best slogan (the one that is better in people's voices) and subtracts 1.0 from the functions weights that work on the worst slogan (its score was clearly too high, so now you lower it). These weight updates will affect the estimates your model assigns to the following pairs, etc.
You run this loop several times until your model gets most of the pairs to the right (or some other convergence criterion).
As a rule, you really do not add or subtract 1.0, but eta times 1.0, where eta is the learning speed that you can set experimentally. As a rule, it is higher at the beginning of training and gradually decreases during training, as your weights move in the right direction. (See Also stochastic gradient descent.) To start, you can simply set it to 0.1 as a constant.
This procedure takes care of the stop words ("the", "of", ...), because they should be found equally often in good and bad slogans (and if they really do not, you will also learn about it).
After training, you can calculate the score for each word in accordance with the recognized weights.