Natural Language Analysis Case Study

I want to use the natural language parsing library for a simple bot chat. I can get tags for parts of speech, but I'm always surprised. What are you doing with POS. If I know parts of speech, then what?

I think this would help with the answers. But what data structures and architecture I could use.

+4
source share
4 answers

A partial speech tagger tags the words in the input text. For example, the popular Penn Treebank tag contains about 40 labels, such as “plural noun,” “comparative adjective,” “past tense verb,” etc. The tagger also eliminates some ambiguity. For example, many English word forms can be either nouns or verbs, but in the context of other words, their part of speech is unambiguous. So, annotating your text with POS tags, you can answer questions such as: how many nouns do I have, how many sentences does the verb contain ?, etc.

For a chatbot, you obviously need a lot more. You need to find out objects and objects in the text and what verb (predicate) they attach; you need to allow anaphors (which individual does he or she indicates), what is the area of ​​negation and quantifiers (for example, each, more than 3), etc.

Ideally, you need to match the entered text with some kind of logical representation (for example, first-order logic), which will allow you to come up with to determine whether the two sentences are equivalent in meaning or in relation to occurrence, etc.

While the POS tag will display the sentence

Mary likes no man who owns a cat. 

to such a structure

 Mary/NNP likes/VBZ no/DT man/NN who/WP owns/VBZ a/DT cat/NN ./. 

you need something like this:

 SubClassOf( ObjectIntersectionOf( Class(:man) ObjectSomeValuesFrom( ObjectProperty(:own) Class(:cat) ) ) ObjectComplementOf( ObjectSomeValuesFrom( ObjectInverseOf(ObjectProperty(:like)) ObjectOneOf( NamedIndividual(:Mary) ) ) ) ) 

Of course, while POS tags have accuracy and recall close to 100%, more complex automatic processing will be much worse.

Good Java library for NLP LingPipe . However, this does not go beyond the scope of POS marking, chunking and named object recognition.

+6
source

Natural language processing is vast and deep, and the roots return at least to the 60s. You can start reading computational linguistics in general, natural language generation , generative grammars , Markov chains , chatterbots , etc.

Wikipedia has a short list of libraries that I believe you may have seen. Java has no long tradition in NLP, although I have not looked at the Stanford library.

I doubt that you will get very impressive results without a deep immersion in linguistics and grammar. Not every favorite school subject (or, as I heard, reported - loved himself!).

+5
source

I will skip a lot of details and make it simple. Parts of speech tags will help you create a parse tree from a sentence. As soon as you do this, you will try to understand the meaning as clearly as possible. The result of this parsing will help you create a suitable answer for chatterbot.

+3
source

After you have a part of speech tags, you can extract, for example, all nouns so that you know something about what things or objects someone is saying.

To give an example:

Someone says, "You can open a new window." When you have POS tags that you know, they don’t talk about the bank (like in a container, bank, etc., which even makes sense in the context of the open), but in the window. You will also learn that open is a verb.

With this information, your bot chat can generate a much better response that has nothing to do with openers, etc.

Note. You do not need a parser to get POS tags. A simple POS tagger is enough. The parser will give you even more information (for example, what is the subject, what is the subject of the proposal?)

+2
source

All Articles