I trained CRF using GenericAcrfTui , it writes ACRF to a file. I'm not quite sure how to download and use trained CRF, but
import cc.mallet.grmm.learning.ACRF; import cc.mallet.util.FileUtils; ACRF c = (ACRF) FileUtils.readObject(Paths.get("acrf.ser.gz").toFile());
seems to work. However, the labeling seems to be incorrect and seems to rely on shortcuts that I pass as input. How can I mark the use of loaded ACRF?
Here is how I do my marking:
GenericAcrfData2TokenSequence instanceMaker = new GenericAcrfData2TokenSequence(); instanceMaker.setDataAlphabet(c.getInputAlphabet()); instanceMaker.setIncludeTokenText(true); instanceMaker.setFeaturesIncludeToken(true); instanceMaker.setLabelsAtEnd(false); Pipe pipe = new SerialPipes(new Pipe[] { instanceMaker, new TokenSequence2FeatureVectorSequence(c.getInputAlphabet(), true, false), }); InstanceList testing = new InstanceList(pipe); Iterator<Instance> testSource = new LineGroupIterator( // initialize the labels to O new StringReader("OO ---- what W=the@1 W=hell@2 \n" + "OO ---- the W=what@-1 W=hell@1 \n" + "OO ---- hell W=what@-2 W=the@-1 "), Pattern.compile("^\\s*$"), true); testing.addThruPipe(testSource); System.out.println(c.getBestLabels(testing.get(0)));
I got this by looking at GenericAcrfTui . Some things I tried:
- When I tried to give different starting labels (except for "O"), then the resulting labeling changed, but this will not help, because I can not guess which labels to give first, otherwise I do not need a tagger.
- I tried not to give any source labels, but just threw an exception, it seems that Mallet really wants these labels.
I noticed that there is also a SimpleTagger that can be used to train CRF , but I think I will still have the same problem as for marking a new input.
Any help labeling using CRF from SimpleTagger or GenericAcrfTui will help.
By the way, I usually use CRF ++, but for this task I want to create my own graph, because I use the dependency analysis functions.
source share