Can someone give a simple explanation of the elements of natural language processing?

I am new to natural language processing and I am confused about the terms used.

What is tokenization? POS tagging? Define an object?

Does tokenization only divide text into parts that may matter or give meaning to these parts? And the meaning of what a name is when I determine that something is a noun, a verb or an adjacent. And if I want to divide by dates, names, currency?

I need a simple explanation of the areas / terms used in NLP.

+3
source share
3 answers

Add to the dmn explanation:

In general, in NLP you should pay attention to two topics:

  1. Statistical and rule-based analysis

  2. Light and heavy analysis

. , , , ( , ).

- , . , spring, , . , , , , . , ( ), ngram , . , Google Translate, -, , scenes- , , , .

, , , , . , ... , , .

- , , , .. . - Stanford Parser, , . , ,

My cat name is Pat.

POS-:

My/PRP$ cat/NN 's/POS name/NN is/VBZ Pat/NNP ./.

POS- , :

(ROOT
  (S
    (NP
      (NP (PRP$ My) (NN cat) (POS 's))
      (NN name))
    (VP (VBZ is)
      (NP (NNP Pat)))
    (. .)))

, :

poss(cat-2, My-1)
poss(name-4, cat-2)
possessive(cat-2, 's-3)
nsubj(Pat-6, name-4)
cop(Pat-6, is-5)

N- - n. n- Google . n- , .

- , , - (, ). , .

. , (, ), . , WordNet Framenet.

+7

,

My cat name is Pat.  He likes to sit on the mat.

, , , . my, cat's, name, is, pat, he, likes, to sit, on, the, mat. ( cat's , lol.)

POS Part-Of-Speech, , POS, , . , , :

My_PRP$ cat_NN 's_POS name_NN is_VBZ Pat_NNP ._.
He_PRP likes_VBZ to_TO sit_VB on_IN the_DT mat_NN ._.

( cat's, .)

. , , , , - , , .

<NAME>Pat</NAME>

.

Pat is a part-time consultant for IBM in Yorktown Heights, New York.

(, Pat ).

<NAME>Pat</NAME>
<ORGANIZATION>IBM</ORGANIZATION>
<LOCATION>Yorktown Heights, New York</LOCATION>

, , .:)

+8

To answer the more specific part of your question: tokenization breaks the text into parts (usually words) without worrying too much about their meaning. POS marking is ambiguous between the possible parts of speech (noun, verb, etc.). It occurs after tokenization. Recognition of dates, names, etc. Called Object Recognition (NER).

+3
source

All Articles