Horizontal Markovization

I need to do horizontal marking (NLP concept), and I'm a little at a loss to understand how the trees will look. I read an article by Klein and Manning , but they don’t explain what trees with horizontal markings of order 2 or order 3 will look like. Can someone shed some light on the algorithm and what SUPPOSED trees look like? I am relatively new to NLP.

+6
source share
2 answers

So, let's say you have a bunch of flat rules, such as:

NP NNP NNP NNP NNP 

or

 VP V Det NP 

When you binarize them, you want to keep the context (i.e. it’s not just Det, but specifically Det, following the verb as part of VP). Annotations like this are usually used for this:

 NP NNP NP->NNP NNP NP->NNP->NNP NNP NP->NNP->NNP->NNP NNP 

or

 VP V VP->V Det VP->V->Det NP 

You need to binarize the tree, but these annotations are not always very significant. They may be somewhat meaningful for the Verb Phrase example, but all you really care about is that the noun can be a pretty long line of proper nouns (like the Peter B. Lewis Building or the Hope Memorial Bridge Project Anniversary "). Thus, with horizontal marking, you will slightly hide some annotations, discarding part of the context. Marking order is the amount of context that you are going to keep. Thus, using ordinary annotations you are basically in infinite order: choosing what to keep the whole context and did not collapse.

The order 0 means that you are going to delete the entire context, and you will get a tree without fantastic annotations, for example:

 NP NNP NNP NNP NNP NNP NNP NNP 

Order 1 means that you save only one context term, and you get a tree like this:

 NP NNP NP->...NNP **one term: NP->** NNP NP->...NNP **one term: NP->** NNP NP->...NNP **one term: NP->** NNP 

Order 2 means that you save two context conditions, and you get such a tree:

 NP NNP NP->NNP **two terms: NP->NNP** NNP NP->NNP->...NNP **two terms: NP->NNP->** NNP NP->NNP->...NNP **two terms: NP->NNP->** NNP 
+10
source

I believe that the idea is to take into account the parent nodes for vertical marking and the sibling nodes for the horizontal when evaluating the probabilities of the rules, and the order shows how many of them are included. There's a good picture for parent annotation here .

Also, a quote from http://www.timothytliu.com/files/NLPAssignment5.pdf :

To approach lexicalization, additional information is added to the parent nodes of each tree. This correctly distinguishes between different attachments and regardless of whether you need left or left. Horizontal marking is done by tracking siblings as the tree is binarized. Vertical marking is done by tracking the parents of a node in a tree. They create new dependencies, because now the rules are a combination of both depth and latitude.

0
source

Source: https://habr.com/ru/post/927685/


All Articles