I am new to the Stanford CoreNLP toolkit and am trying to use it for a project to enable core features in news texts. In order to use the Stanford CoreNLP coding system, we usually created a pipeline that required tokenization, sentence splitting, speech targeting, lemmatization, entity recognition, and parsing. For instance:
Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); // read some text in the text variable String text = "As competition heats up in Spain crowded bank market, Banco Exterior de Espana is seeking to shed its image of a state-owned bank and move into new activities."; // create an empty Annotation just with the given text Annotation document = new Annotation(text); // run all Annotators on this text pipeline.annotate(document);
Then we can easily get proposal annotations with:
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
However, I use other preprocessing tools and just need a standalone coding resolution system. It is quite easy to create tokens and syntactic annotations of trees and assign their annotations:
// create new annotation Annotation annotation = new Annotation(); // create token annotations for each sentence from the input file List<CoreLabel> tokens = new ArrayList<>(); for(int tokenCount = 0; tokenCount < parsedSentence.size(); tokenCount++) { ArrayList<String> parsedLine = parsedSentence.get(tokenCount); String word = parsedLine.get(1); String lemma = parsedLine.get(2); String posTag = parsedLine.get(3); String namedEntity = parsedLine.get(4); String partOfParseTree = parsedLine.get(6); CoreLabel token = new CoreLabel(); token.setWord(word); token.setWord(lemma); token.setTag(posTag); token.setNER(namedEntity); tokens.add(token); } // set tokens annotations to annotation annotation.set(TokensAnnotation.class, tokens); // set parse tree annotations to annotation Tree stanfordParseTree = Tree.valueOf(inputParseTree); annotation.set(TreeAnnotation.class, stanfordParseTree);
However, creating annotations of the proposal is quite difficult, because, as far as I know, there is no document to explain it in detail. I can create a data structure for proposal annotations and set it for annotation:
List<CoreMap> sentences = new ArrayList<CoreMap>(); annotation.set(SentencesAnnotation.class, sentences);
I am sure it cannot be so difficult, but there is no documentation on how to create a proposal annotation from token annotations, i.e. how to populate an ArrayList with actual sentence annotations.
Any ideas?
Btw, if I use the token and parsing annotations provided by my processing tools and only use the offer annotations provided by the StanfordCoreNLP pipeline, and use the stand-alone StanfordCoreNLP cell resolution system. I get the right results. Thus, the only part missing for a complete autonomous reference cell resolution system is the ability to create proposal annotations from token annotations.