How the decision tree computes the spliting attribute

When we use the decision tree algorithm, and our data set consists of numerical values. I found that the results provided by the program break the node values ​​into values ​​that do not even exist in the dataset Example: Classification results

  • attrib2 <= 3.761791861252009: groupA
  • attrib2> 3.761791861252009: groupB

where, since in my dataset there is no value for attribute2, like 3.76179 ----- why is this so?

-

+4
source share
3 answers

There are several ways to select an attribute. And not all of them select values ​​in the data set.

A common (albeit slightly simplified) is the adoption of an average value. It is possible that 3.76179 ... is the average attribute value of your dataset.

For example, if your dataset is 1 size and is made from the values -10, -9, .. -2, -1, 1, 2, ..9, 10 , then a good splitting value will be 0 , even if it is not in your dataset.

Another possibility, especially if you are dealing with random forests (multiple decision trees), is that the splitting value is chosen randomly, with a probability distribution centered around the median value. Some algorithms decide to divide by Gaussian with the center on the average / median value and with a deviation equal to the standard deviations of the data set.

+7
source

Most decision tree building algorithms (J48, C4.5, CART, ID3) work as follows:

  • Sort attributes that you can separate.
  • Find all the "breakpoints" where the associated class labels change.
  • Consider the separation points at which the labels change. Choose one that minimizes the measure of cleanliness. However, the gain in information depends only on the order, and not on the value.

Once you find the best split point, the algorithms disagree on how to represent it. Example: let's say that you have -4 (Yes), -3 (Yes), -3 (Yes), -2 (No), -1 (No). Any value between -3 and -2 will have the same purity. Some algorithms (C4.5) will say val <= -3. Others, for example. Weka, choose the average value and give val <= -2.5.

+8
source

First you can check how to sample a numerical value. These algorithms divide the numerical range of values ​​into several intervals, each of which has a large infogain. For example, you go to step 0.1 after each split that you check its infogain and select the best position, and then continue at intervals.

0
source

All Articles