Parsey McParseface misidentifies root on questions

It seems to me that Parsi has serious problems with the correct tags and any sentence with "is" in it.


Text: Barack Obama from Hawaii?

GCloud Icons (correct):

  • Is - [root] VERB
  • Barack - [nn] NOUN
  • Obama - [nsubj] NOUN
  • from - [adp] PREP
  • Hawaii - [pobj] NOUN

Marker points (wrong):

  • Is - [cop] VERB
  • Barack - [nsubj] NOUN
  • Obama - [root] NOUN
  • from - [adp] PREP
  • Hawaii - [pobj] NOUN

Parsi decides to make a noun (!) Obama root, which ruined everything else.


Text: My name is Philip.

GCloud Icons (correct):

  • My [poss] PRON
  • name [nsubj] NOUN
  • is [root] VERB
  • Philipp [attr] NOUN

ParseyTokens (wrong):

  • My [poss] PRON
  • name [nsubj] NOUN
  • is [cop] VERB
  • Philipp [root] NOUN

Again, the Parsi chooses NOUN as root and fights with COP.


Any ideas why this is happening and how I can fix it?

Thanks Phil

+5
source share
3 answers

As for the first example, it seems that Parsiโ€™s training data is quite old and does not even mention the word โ€œBarackโ€. If you replace Barack Obama with Bill Clinton, you get the right analysis.

Input: Is Bill Clinton from Hawaii ? Parse: Is VBZ ROOT +-- Clinton NNP nsubj | +-- Bill NNP nn +-- from IN prep | +-- Hawaii NNP pobj +-- ? . punct

The second example, instead, is correctly parsed according to Stanford's degrees (see "Processing copula verbs" at http://nlp.stanford.edu/software/dependencies_manual.pdf ).

Input: My name is Philip Parse: Philip NNP ROOT +-- name NN nsubj | +-- My PRP$ poss +-- is VBZ cop

+1
source

I have to answer my question: I have limited knowledge of Parsi McParsefay. However, since no one answered, I hope I can add some value.

I think the main problem with most machine learning models is the lack of interpretability. This relates to your first question: "Why is this happening?" This is very difficult to say, because this tool is based on the black box model, namely the neural network. I will say that this seems extremely surprising, given the strong claims made against Parsi that an ordinary word like 'is' fools him consistently. Perhaps you made a mistake? It's hard to say without a code.

I assume that you did not make a mistake, and in this case, I think you could solve this problem (or mitigate it), using your observation that the word "is" seems to have dropped the model. You can simply check this sentence for the word "eat" and use GCloud (or another parser) in this case. Conveniently, as soon as you use both options, you can use GCloud as a backup for other cases where Parsey seems to fail if you find them in the future.

As for improving the basic model, if this suits you, you can recreate it using the original paper, and perhaps optimize your learning situation.

0
source

Since he is correctly marked by Barack Obama as 2 nouns, I do not think that the problem of his ignorance with the name. I think Parsi has a ban on using "is" as a root.

In a theoretical grammar of dependence, a noun is never the root of a complete sentence. However, Parsi does not follow the theory; he has a strong preference for making content words in the heads. I think he decided that when you say โ€œX is Yโ€, the head of the sentence should be โ€œXโ€ and not โ€œisโ€ because โ€œisโ€ is not an informative word.

... Besides the example of Bill Clinton, who can prove me wrong! I haven't got Parsi running on my own computer yet, so I'm not sure.

0
source

All Articles