How to combine various NLP features for machine learning?

I am trying to teach KNN using various NLP features. For example, I want to use bagged words and local POS tags.

Separately, I have some idea of ​​how to calculate the similarity with a single function. Similar to using cosine similarities with counters (for vectors with word bags) or using, possibly, the Hamming distance for POS tags.

However, I do not know how to combine these two. How do people in this area usually do this? Can someone help me with this?

Thanks in advance.

+4
source share
1 answer

I would use a simple linear combination of both functions. Thus, you individually compare the word bag vectors using cosine similarity and using the Hamming distance for POS tags, and then take the average of both results. Therefore, if the comparison of cosines and the Hamming distance have the following rank:

rank score cosine Hamming ------------------------------- 1 red blue 2 blue yellow 3 yellow orange 4 orange red 

Then the final rating (taking into account the ranking rating, above which you can change the course, for example, on an exponential scale, if you want to pay more attention to labels with a higher rating) will be as follows: (a lower rating is better)

 label total score -------------------- blue 3 red 5 yellow 5 orange 7 

So the output shortcut will be blue . In this case, the linear combination is 50% of the weight at the output of the cosine similarity and 50% of the weight at the output of Hamming. You can run tests with different weights (e.g. 70% cosine, 30% Hamming) to find the optimal balance between both measurements.

+2
source

All Articles