How to download and use CRF prepared using Mallet?

I trained CRF using GenericAcrfTui , it writes ACRF to a file. I'm not quite sure how to download and use trained CRF, but

 import cc.mallet.grmm.learning.ACRF; import cc.mallet.util.FileUtils; ACRF c = (ACRF) FileUtils.readObject(Paths.get("acrf.ser.gz").toFile()); 

seems to work. However, the labeling seems to be incorrect and seems to rely on shortcuts that I pass as input. How can I mark the use of loaded ACRF?

Here is how I do my marking:

 GenericAcrfData2TokenSequence instanceMaker = new GenericAcrfData2TokenSequence(); instanceMaker.setDataAlphabet(c.getInputAlphabet()); instanceMaker.setIncludeTokenText(true); instanceMaker.setFeaturesIncludeToken(true); instanceMaker.setLabelsAtEnd(false); Pipe pipe = new SerialPipes(new Pipe[] { instanceMaker, new TokenSequence2FeatureVectorSequence(c.getInputAlphabet(), true, false), }); InstanceList testing = new InstanceList(pipe); Iterator<Instance> testSource = new LineGroupIterator( // initialize the labels to O new StringReader("OO ---- what W=the@1 W=hell@2 \n" + "OO ---- the W=what@-1 W=hell@1 \n" + "OO ---- hell W=what@-2 W=the@-1 "), Pattern.compile("^\\s*$"), true); testing.addThruPipe(testSource); System.out.println(c.getBestLabels(testing.get(0))); 

I got this by looking at GenericAcrfTui . Some things I tried:

  • When I tried to give different starting labels (except for "O"), then the resulting labeling changed, but this will not help, because I can not guess which labels to give first, otherwise I do not need a tagger.
  • I tried not to give any source labels, but just threw an exception, it seems that Mallet really wants these labels.

I noticed that there is also a SimpleTagger that can be used to train CRF , but I think I will still have the same problem as for marking a new input.

Any help labeling using CRF from SimpleTagger or GenericAcrfTui will help.

By the way, I usually use CRF ++, but for this task I want to create my own graph, because I use the dependency analysis functions.

+6
source share
1 answer

I get it!

The problem was that the pipe did not know the target alphabet. The solution is to use a CRF Pipe , for example:

 Pipe pipe = crf.getInputPipe(); 

instead of doing this crazy to make your own Pipe .

Now, if anyone knows how to make a new Instance using a query, this is also good, I just copied what the trainer does.

+5
source

All Articles