Adding weka instances after grading, but before grading?

Question

Adding weka instances after grading, but before grading?

Assume that X is a raw, labeled (i.e. with training marks) dataset, and Process(X) returns a set of instances of Y that have been encoded using attributes and converted to a weka-friendly file such as Y.arff.

Also suppose Process() has some “leak”: some Leak = XY instances cannot be sequentially encoded and are needed to get the default classification FOO . Training labels are also known for a set of leaks.

My question is, how can I better represent examples from Leak into the weka evaluation stream AFTER some classifier has been applied to a subset of Y , adding Leak instances with their default value before evacuating the entire set of X ? In code:

 DataSource LeakSrc = new DataSource("leak.arff"); Instances Leak = LeakSrc.getDataSet(); DataSource Ysrc = new DataSource("Y.arff"); Instances Y = Ysrc.getDataSet(); classfr.buildClassifer(Y) // YunionLeak = ?? eval.crossValidateModel(classfr, YunionLeak);

Perhaps this is a concrete example of folding results from several classifiers?

+7

weka

rikb Oct 28 '15 at 9:35

source share

2 answers

Depending on your classifier, this can be very simple! Weka has an interface called UpdateableClassifier , any class using it can be updated after it is created! The following classes implement this interface:

HoeffdingTree
IBK
KSTAR
Lwl
MultiClassClassifierUpdateable
NaiveBayesMultinomialText
NaiveBayesMultinomialUpdateable
NaiveBayesUpdateable
SGD
SGDText

Then it can be updated approximately as follows:

  ArffLoader loader = new ArffLoader(); loader.setFile(new File("/data/data.arff")); Instances structure = loader.getStructure(); structure.setClassIndex(structure.numAttributes() - 1); NaiveBayesUpdateable nb = new NaiveBayesUpdateable(); nb.buildClassifier(structure); Instance current; while ((current = loader.getNextInstance(structure)) != null) { nb.updateClassifier(current); }

-one

Sjb Oct 29 '15 at 9:19

source share

rikb · Accepted Answer · 2015-11-07T00:05:07+0000

the generosity closes, but Mark Hall, on another forum ( http://list.waikato.ac.nz/pipermail/wekalist/2015-November/065348.html ) deserves what should be considered the current answer:

You need to implement the construction of a classifier for cross-validation in your code. You can still use the evaluation object to calculate statistics for your modified test is reset, because the statistics that it calculates are all Additive. Instances.trainCV () and Instances.testCV () can be used to create a fold:

http://weka.sourceforge.net/doc.stable/weka/core/Instances.html#trainCV(int,%20int,%20java.util.Random)

You can then call buildClassifier () to handle each training bend, change the test to flush to the contents of your hearts, and then iterate over the instances in the test folds using either Evaluation.evaluateModelOnce () or Evaluation.evaluateModelOnceAndRecordPrediction (). A later version is useful if you need an area under the curve metric summary (since these require that the predictions be preserved).

http://weka.sourceforge.net/doc.stable/weka/classifiers/Evaluation.html#evaluateModelOnce(weka.classifiers.Classifier,%20weka.core.Instance)

http://weka.sourceforge.net/doc.stable/weka/classifiers/Evaluation.html#evaluateModelOnceAndRecordPrediction(weka.classifiers.Classifier,%20weka.core.Instance)

Adding weka instances after grading, but before grading?

More articles: