Stkenford nlp tokenizer

How can I tokenize a string in a java class using the stanford parser?

I can only find examples of documentProcessor and PTBTokenizer taking text from an external file.

DocumentPreprocessor dp = new DocumentPreprocessor("hello.txt"); for (List sentence : dp) { System.out.println(sentence); } // option #2: By token PTBTokenizer ptbt = new PTBTokenizer(new FileReader("hello.txt"), new CoreLabelTokenFactory(), ""); for (CoreLabel label; ptbt.hasNext(); ) { label = (CoreLabel) ptbt.next(); System.out.println(label); } 

Thanks.

+6
source share
1 answer

The PTBTokenizer constructor accepts java.io.Reader, then you can use StringReader to parse your text.

+6
source

Source: https://habr.com/ru/post/927503/


All Articles