Replacing the pronoun with its antecedent using python2.7 and nltk

As the name implies, I'm trying to look for pronouns in a string and replace it with an antecedent like:

[in]: "the princess looked from the palace, she was happy". [out]: "the princess looked from the palace, the princess was happy". 

I use put tag to return pronouns and nouns. I need to know how to replace without knowing the sentence, which means how to indicate the subject in the sentence in order to replace the pronoun with it. Any suggestions?

+7
source share
1 answer

I don't know the nltk package (never used it), but it seems to give you an answer right away. If you look at an example parsing tree on nltk.org, it shows that the object was successfully tagged with an NP-SBJ tag. Isn't that what you are looking for?

(I used to miss the "nltk" part in the title, and I wrote the part below. I think this may be interesting as a general introduction on how to solve such problems (especially if you do not have the package available), so I will leave it here :)

This is more of a "natural language" (like English) than a Python question. Could you clarify what proposals you are expecting? Should she work on all possible English sentences? I think it would be very difficult.

If the sentences are sufficiently “light”, it may be sufficient to assume that everything up to the first verb is a subject. This works for your example, but does not work for the following sentences:

 yesterday the princess looked from the palace, she was happy. the princes who drank tea looked from the palace, she was happy. 

(note that in the last sentence the subject is “the princess who drank tea”, the part “who drank tea” is an “adjective phrase”).

Also indicate what should happen if the pronoun does not point to an object (but, for example, to an object):

 the princess looked at the prince, he was happy. 

To solve your problem in the most general case, you must find (or draw up) an official specification of the English (or any other) language that could accurately indicate which part of the sentence is a subject, a verb, an object, etc. Example: many simple English sentences have the form (parts between brackets [] are optional, parts between parentheses () are choices, ie (| | a) means you must choose either "or", or "):

 sentence := subject verb [object] 

Each part on the right side of the specification should be indicated in more detail, for example:

 subject, object := (noun_part_of_sentence|noun_part_of_sentence_plural) noun_part_of_sentence := article [adjectivelist] [noun_modifier] noun # I guess there is a formal name for this... noun_part_of_sentence_plural := [adjectivelist] [noun_modifier] noun_plural # note: no article adjectivelist:= adjective [adjectivelist] # ie, one or more adjectives 

For more complex sentences, such as above, with an adjective phrase, the above specification is not enough, and should be something like:

 noun_part_of_sentence := (the|a) [adjectivelist] [noun_modifier] [noun] [adjective_phrase] adjective_phrase := relative_pronoun verb [object] relative_pronoun := (who|which|that) 

Please note that the above specification is already quite powerful: (if you can correctly determine the type of each word, for example, a verb, noun, article, etc.), it can successfully detect the following sentences:

 The princess drank the tea. The beautiful princess drank the tea. The beautiful princess drank delicious the tea. A beautiful princess drank delicious the lemon tea. The beautiful princess who saw the handsome prince drank the refreshing tea. The beautiful princess who saw the handsome prince who made the tea drank the refreshing tea. 

However, he does not allow (yet) for proposals such as “the princess was looking at the palace”, “the princess was drinking tea” (note: “not tea”) and endless others. The trick is to expand the formal specification to a level that is appropriate for the type of expected offer.

After you have successfully analyzed your sentence, you (in this way) know what the subject is, any pronouns, and you can make a replacement. Please note, however, that English is not unambiguous, for example:

 The princess looked at her mother, she was happy. 

Does she point to a princess or her mother?

Good luck

PS English is not my native language, so I hope that I have used the right conditions for everything!

+2
source