Coreference Resolution Using Stanford CoreNLP

Question

Coreference Resolution Using Stanford CoreNLP

I am new to the Stanford CoreNLP toolkit and am trying to use it for a project to enable core features in news texts. In order to use the Stanford CoreNLP coding system, we usually created a pipeline that required tokenization, sentence splitting, speech targeting, lemmatization, entity recognition, and parsing. For instance:

Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); // read some text in the text variable String text = "As competition heats up in Spain crowded bank market, Banco Exterior de Espana is seeking to shed its image of a state-owned bank and move into new activities."; // create an empty Annotation just with the given text Annotation document = new Annotation(text); // run all Annotators on this text pipeline.annotate(document);

Then we can easily get proposal annotations with:

 List<CoreMap> sentences = document.get(SentencesAnnotation.class);

However, I use other preprocessing tools and just need a standalone coding resolution system. It is quite easy to create tokens and syntactic annotations of trees and assign their annotations:

 // create new annotation Annotation annotation = new Annotation(); // create token annotations for each sentence from the input file List<CoreLabel> tokens = new ArrayList<>(); for(int tokenCount = 0; tokenCount < parsedSentence.size(); tokenCount++) { ArrayList<String> parsedLine = parsedSentence.get(tokenCount); String word = parsedLine.get(1); String lemma = parsedLine.get(2); String posTag = parsedLine.get(3); String namedEntity = parsedLine.get(4); String partOfParseTree = parsedLine.get(6); CoreLabel token = new CoreLabel(); token.setWord(word); token.setWord(lemma); token.setTag(posTag); token.setNER(namedEntity); tokens.add(token); } // set tokens annotations to annotation annotation.set(TokensAnnotation.class, tokens); // set parse tree annotations to annotation Tree stanfordParseTree = Tree.valueOf(inputParseTree); annotation.set(TreeAnnotation.class, stanfordParseTree);

However, creating annotations of the proposal is quite difficult, because, as far as I know, there is no document to explain it in detail. I can create a data structure for proposal annotations and set it for annotation:

 List<CoreMap> sentences = new ArrayList<CoreMap>(); annotation.set(SentencesAnnotation.class, sentences);

I am sure it cannot be so difficult, but there is no documentation on how to create a proposal annotation from token annotations, i.e. how to populate an ArrayList with actual sentence annotations.

Any ideas?

Btw, if I use the token and parsing annotations provided by my processing tools and only use the offer annotations provided by the StanfordCoreNLP pipeline, and use the stand-alone StanfordCoreNLP cell resolution system. I get the right results. Thus, the only part missing for a complete autonomous reference cell resolution system is the ability to create proposal annotations from token annotations.

+1

java nlp stanford-nlp

tradt Jun 20 '15 at 13:43

source share

1 answer

Sebastian schuster · Accepted Answer · 2015-06-20T18:11:15+0000

There is an Annotation constructor with an argument to List<CoreMap> sentences , which sets the document if you have a list of suggestions already indicated.

For each sentence, you want to create a CoreMap object as follows. (Note that I also added a sentence and a token to each sentence and token object, respectively.)

 int sentenceIdx = 1; List<CoreMap> sentences = new ArrayList<CoreMap>(); for (parsedSentence : parsedSentences) { CoreMap sentence = new CoreLabel(); List<CoreLabel> tokens = new ArrayList<>(); for(int tokenCount = 0; tokenCount < parsedSentence.size(); tokenCount++) { ArrayList<String> parsedLine = parsedSentence.get(tokenCount); String word = parsedLine.get(1); String lemma = parsedLine.get(2); String posTag = parsedLine.get(3); String namedEntity = parsedLine.get(4); String partOfParseTree = parsedLine.get(6); CoreLabel token = new CoreLabel(); token.setWord(word); token.setLemma(lemma); token.setTag(posTag); token.setNER(namedEntity); token.setIndex(tokenCount + 1); tokens.add(token); } // set tokens annotations and id of sentence sentence.set(TokensAnnotation.class, tokens); sentence.set(SentenceIndexAnnotation.class, sentenceIdx++); // set parse tree annotations to annotation Tree stanfordParseTree = Tree.valueOf(inputParseTree); sentence.set(TreeAnnotation.class, stanfordParseTree); // add sentence to list of sentences sentences.add(sentence); }

Then you can create an instance of Annotation using the sentences list:

 Annotation annotation = new Annotation(sentences);

Coreference Resolution Using Stanford CoreNLP

More articles: