Find adjectives related to the noun

I want to try to determine the characteristics of the user's personality based on the words that they entered in the search box. Here is an example:

Search query : "computers"

Identities / descriptors defined : analytical, logical, systematic, methodological


I understand that this task is extremely non-trivial. I used to use WordNet, but I'm not sure if it contains adjective clouds for each noun node. Private speech is its own beast, so I'm not sure that creating your own corpus and searching for adjective term frequencies that coexist with keywords is the best idea, but I will explain it below.

I am currently working with a Wikipedia dump, processing each article for the frequency of the term after deleting stop words (and, or, from, in, etc.). My idea was to possibly search for the coexistence of adjectives (using WordNet for POS tags) and nouns throughout the corpus (for example, an adjective logical often occurs in conjunction with a computer noun), and based on the relative, initial-adjective frequency, judge about its semantic connection with a noun or not. The potential applications are huge.


Another idea is to stop the noun, look for adjectives that start with this base, and then look for synonyms for this adjective. Example:

Search query : "computers"

Stem : "comput-"

Adjectives with the trunk : Computing

Synonyms :


The problem is that adjective forms of nouns do not always have adjective forms, and some stem nouns will correspond to terribly incorrect adjectives. * BAD * example:

Search query : "running" (technically gerund, but still a noun)

Stem : "run-"

Adjectives with Trunk: runny

Synonyms : I DO NOT WANT. I would like to find words such as "athletic", "motivated", "disciplined"


Is this something that has been done before? Do you have any suggestions on how I could approach this? It is almost as if I were trying to generate adjective clouds for “important” words in a document.

EDIT: I understand that there is no “right” answer to this problem. I will reward generosity to those who offer a method with the best theoretical potential.

+4
source share
2 answers

Assuming you have some sophisticated computing resources to give up on this, I would suggest using something simple, such as Hyperspace Analog of Language (HAL), to create a Term X Term matrix for your Wikipedia dump. Then your algorithm might look something like this:

  • Given the query word / term, find it (HAL).
  • For the vector, find the adjective components with the highest weights.
    • To do this efficiently, you probably want to use a dictionary (e.g. WordNet) to pre-process a list of terms (i.e. extracted HALs) that you know (before processing requests), which of them can be used as adjectives.
  • For each adjective, find the N most similar vectors in your HAL space.
    • Optional: you can narrow this list down by looking for words that appear in your search terms.

This approach mainly diverts memory and computational efficiency for simplicity in terms of code structure and data. However, he should do well with what I think you want. The first step will give you adjectives that are most often related to the query term, while vector similarity in the HAL space (step 3) will give words that are paradigmatically related (roughly speaking, they can be replaced by each other, so if you start with a certain kind of adjective, you should get more adjectives “like this” in terms of its relation to the query term), which should be a pretty good proxy for the “cloud” you are looking for.

+1
source

WordNet does not have what you need - it contains (almost) information about the relationship between words that are not synonymous or hierarchically related (chair-> furniture), etc.

Just use OpenNLP (http://opennlp.apache.org) and analyze large amounts of text. The OpenNLP parser will detect the verb-adjective / noun-adjective in sentences that allow you to build a database of relationships. All that remains at this stage is filtering the database according to a predefined list of adjectives.

+1
source

All Articles