Stanford Dependency Parser - how to get flights?

I'm doing dependency analysis with the Stanford library in Java. Is there a way to return indexes in my source dependency string? I tried calling the getSpans () method, but it returns null for each token:

LexicalizedParser lp = LexicalizedParser.loadModel( "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz", "-maxLength", "80", "-retainTmpSubcategories"); TreebankLanguagePack tlp = new PennTreebankLanguagePack(); GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory(); Tree parse = lp.apply(text); GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); Collection<TypedDependency> tdl = gs.typedDependenciesCollapsedTree(); for(TypedDependency td:tdl) { td.gov().getSpan() // it null! td.dep().getSpan() // it null! } 

Any idea?

+4
source share
2 answers

I finally wrote my own helper function to get a simple line:

 public HashMap<Integer, TokenSpan> getTokenSpans(String text, Tree parse) { List<String> tokens = new ArrayList<String>(); traverse(tokens, parse, parse.getChildrenAsList()); return extractTokenSpans(text, tokens); } private void traverse(List<String> tokens, Tree parse, List<Tree> children) { if(children == null) return; for(Tree child:children) { if(child.isLeaf()) { tokens.add(child.value()); } traverse(tokens, parse, child.getChildrenAsList()); } } private HashMap<Integer, TokenSpan> extractTokenSpans(String text, List<String> tokens) { HashMap<Integer, TokenSpan> result = new HashMap<Integer, TokenSpan>(); int spanStart, spanEnd; int actCharIndex = 0; int actTokenIndex = 0; char actChar; while(actCharIndex < text.length()) { actChar = text.charAt(actCharIndex); if(actChar == ' ') { actCharIndex++; } else { spanStart = actCharIndex; String actToken = tokens.get(actTokenIndex); int tokenCharIndex = 0; while(tokenCharIndex < actToken.length() && text.charAt(actCharIndex) == actToken.charAt(tokenCharIndex)) { tokenCharIndex++; actCharIndex++; } if(tokenCharIndex != actToken.length()) { //TODO: throw exception } actTokenIndex++; spanEnd = actCharIndex; result.put(actTokenIndex, new TokenSpan(spanStart, spanEnd)); } } return result; } 

Then i will call

  getTokenSpans(originalString, parse) 

So, I get a map that can match each token with the corresponding marker interval. This is not an elegant solution, but at least it works.

+2
source

Even if you already answered your own questions, and this is an old thread: I just stumbled upon the same problem today, but with (Stanford) LexicalizedParser, and not with the dependency parser. I did not check it for a dependent, but the following solution to my problem in the lexParser script:

 List<Word> wl = tree.yieldWords(); int begin = wl.get(0).beginPosition(); int end = wl.get(wl.size()-1).endPosition(); Span sp = new Span(begin, end); 

If then Span contains tree indexes (under). (And if you get to the terminals, I think the same should work at the token level).

Hope this helps someone else run into the same problem!

0
source

All Articles