In short, take the average cosine of the initial clustering or even all of the initial offers and accept or reject the clusters based on something like the following.
- , (1,5 (86- , ) 3 (99,9- ) outlier), . , , .
, .
average(cosine_similarities)+alpha*standard_deviation(cosine_similarities)
, , NLTK. , . , 1- . LSA LDA. 1,5 , 1 + Wu Palmer ( ), K, , .
, , . , 10000 - . , , - 15 000, 20 - 20 000 . , API - 20 . senti-wordnet.
, .
, , , . t / wu-palmer SOV , . Commons Math3 java/ scala , scipy python, R -.
Xbar +/- tsub(alpha/2)*sample_std/sqrt(sample_size)
. . , , . , , , , , , .