Assume that X is a raw, labeled (i.e. with training marks) dataset, and Process(X) returns a set of instances of Y that have been encoded using attributes and converted to a weka-friendly file such as Y.arff.
Also suppose Process() has some “leak”: some Leak = XY instances cannot be sequentially encoded and are needed to get the default classification FOO . Training labels are also known for a set of leaks.
My question is, how can I better represent examples from Leak into the weka evaluation stream AFTER some classifier has been applied to a subset of Y , adding Leak instances with their default value before evacuating the entire set of X ? In code:
DataSource LeakSrc = new DataSource("leak.arff"); Instances Leak = LeakSrc.getDataSet(); DataSource Ysrc = new DataSource("Y.arff"); Instances Y = Ysrc.getDataSet(); classfr.buildClassifer(Y)
Perhaps this is a concrete example of folding results from several classifiers?
weka
rikb
source share