I want to try to determine the characteristics of the user's personality based on the words that they entered in the search box. Here is an example:
Search query : "computers"
Identities / descriptors defined : analytical, logical, systematic, methodological
I understand that this task is extremely non-trivial. I used to use WordNet, but I'm not sure if it contains adjective clouds for each noun node. Private speech is its own beast, so I'm not sure that creating your own corpus and searching for adjective term frequencies that coexist with keywords is the best idea, but I will explain it below.
I am currently working with a Wikipedia dump, processing each article for the frequency of the term after deleting stop words (and, or, from, in, etc.). My idea was to possibly search for the coexistence of adjectives (using WordNet for POS tags) and nouns throughout the corpus (for example, an adjective logical often occurs in conjunction with a computer noun), and based on the relative, initial-adjective frequency, judge about its semantic connection with a noun or not. The potential applications are huge.
Another idea is to stop the noun, look for adjectives that start with this base, and then look for synonyms for this adjective. Example:
Search query : "computers"
Stem : "comput-"
Adjectives with the trunk : Computing
Synonyms :
The problem is that adjective forms of nouns do not always have adjective forms, and some stem nouns will correspond to terribly incorrect adjectives. * BAD * example:
Search query : "running" (technically gerund, but still a noun)
Stem : "run-"
Adjectives with Trunk: runny
Synonyms : I DO NOT WANT. I would like to find words such as "athletic", "motivated", "disciplined"
Is this something that has been done before? Do you have any suggestions on how I could approach this? It is almost as if I were trying to generate adjective clouds for “important” words in a document.
EDIT: I understand that there is no “right” answer to this problem. I will reward generosity to those who offer a method with the best theoretical potential.