Use one attribute only once in scikit-learn decision tree in python

I use scikit-learn python module to create a decision tree and work like a charm. I would like to achieve one more thing. So that the tree is divided by attribute only once.

The reason for this is due to my very strange dataset. I use a noisy dataset and I'm really interested in noise. My class results are binary, say [+, -]. I have a bunch of attributes with numbers mostly in the range (0,1).

When scikit-learn creates a tree, it splits into attributes several times to make the tree "better." I understand that in this way the leaf nodes become cleaner, but this is not the case that I would like to achieve.

The thing I did was to determine the cutoffs for each attribute by calculating the gain of the information in different slices and choosing a maximum. Thus, using the cross-validation methods "leave-one-out" and "1 / 3-2 / 3" I get better results than the original tree.

The problem is that when I try to automate this, I run into a problem around the lower and upper bounds, for example. about 0 and 1, because most of the elements will be under / top, and I get a really high informational gain, because one of the sets is clean, even if it contains only 1-2% of the total data.

In general, I would like to do something to make scikit-learn only split the attribute once.

If this is not possible, do you guys have any tips on how to generate these cuts in a beautiful way?

Thanks a lot!

+6
source share

All Articles