Help needed Creating a binary tree based on the truth table

First, to ensure full disclosure, I want to point out that this is related to homework in a machine learning class. This question is not a homework question, and instead I need to find out to complete the big problem of creating an ID3 decision tree algorithm.

I need to generate a tree like the one below when defining a truth table

let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0))) 

learnTree is of type BinaryTree, which I defined as follows:

 type BinaryTree = | Leaf of int | Node of int * string * BinaryTree * BinaryTree 

ID3 algorithms take into account various equations to determine where to divide the tree, and I did it, I just had a problem creating the learned tree from my truth table. For example, if I have the following table

 A1 | A2 | A3 | Class 1 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 1 1 0 

And I decided to divide by attribute A1, I would get the following:

  (A1 = 1) A1 (A1 = 0) A2 | A3 | Class A2 | A3 | Class 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 1 1 

Then I would split the left side and split the right side and continue the recursive pattern until the leaf nodes are clean and I end up with a tree similar to the next based on splitting.

 let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0))) 

Here is what I have so far โ€œcrackedโ€ together, but I think I could be aloof:

 let rec createTree (listToSplit : list<list<float>>) index = let leftSideSplit = listToSplit |> List.choose (fun x -> if x.Item(index) = 1. then Some(x) else None) let rightSideSplit = listToSplit |> List.choose (fun x -> if x.Item(index) = 0. then Some(x) else None) if leftSideSplit.Length > 0 then let pureCheck = isListPure leftSideSplit if pureCheck = 0 then printfn "%s" "Pure left node class 0" createTree leftSideSplit (index + 1) else if pureCheck = 1 then printfn "%s" "Pure left node class 1" createTree leftSideSplit (index + 1) else printfn "%s - %A" "Recursing Left" leftSideSplit createTree leftSideSplit (index + 1) else printfn "%s" "Pure left node class 0" 

Should I use pattern matching? Any tips / ideas / help? Thanks a bunch!

+6
f #
source share
3 answers

Edit: I have since published an ID3 implementation on my blog at: http://blogs.msdn.com/chrsmith

Hey Jim, I have long wanted to write a blog post about implementing ID3 in F #. Thanks for giving me the fulfillment. Although this code does not implement the full algorithm (or correctly), it should be enough to get started.

In general, you have the right approach - presenting each branch as a case with a discriminatory union is good. And, as Brian said, List.partition is definitely a convenient feature. The trick for proper operation is all to determine the optimal attribute / value pair for separation - and for this you need to calculate the information gain through entropy, etc.

 type Attribute = string type Value = string type Record = { Weather : string Temperature : string PlayTennis : bool } override this.ToString() = sprintf "{Weather = %s, Temp = %s, PlayTennis = %b}" this.Weather this.Temperature this.PlayTennis type Decision = Attribute * Value type DecisionTreeNode = | Branch of Decision * DecisionTreeNode * DecisionTreeNode | Leaf of Record list // ------------------------------------ // Splits a record list into an optimal split and the left / right branches. // (This is where you use the entropy function to maxamize information gain.) // Record list -> Decision * Record list * Record list let bestSplit data = // Just group by weather, then by temperature let uniqueWeathers = List.fold (fun acc item -> Set.add item.Weather acc) Set.empty data let uniqueTemperatures = List.fold (fun acc item -> Set.add item.Temperature acc) Set.empty data if uniqueWeathers.Count = 1 then let bestSplit = ("Temperature", uniqueTemperatures.MinimumElement) let left, right = List.partition (fun item -> item.Temperature = uniqueTemperatures.MinimumElement) data (bestSplit, left, right) else let bestSplit = ("Weather", uniqueWeathers.MinimumElement) let left, right = List.partition (fun item -> item.Weather = uniqueWeathers.MinimumElement) data (bestSplit, left, right) let rec determineBranch data = if List.length data < 4 then Leaf(data) else // Use the entropy function to break the dataset on // the category / value that best splits the data let bestDecision, leftBranch, rightBranch = bestSplit data Branch( bestDecision, determineBranch leftBranch, determineBranch rightBranch) // ------------------------------------ let rec printID3Result indent branch = let padding = new System.String(' ', indent) match branch with | Leaf(data) -> data |> List.iter (fun item -> printfn "%s%s" padding <| item.ToString()) | Branch(decision, lhs, rhs) -> printfn "%sBranch predicate [%A]" padding decision printfn "%sWhere predicate is true:" padding printID3Result (indent + 4) lhs printfn "%sWhere predicate is false:" padding printID3Result (indent + 4) rhs // ------------------------------------ let dataset = [ { Weather = "windy"; Temperature = "hot"; PlayTennis = false } { Weather = "windy"; Temperature = "cool"; PlayTennis = false } { Weather = "nice"; Temperature = "cool"; PlayTennis = true } { Weather = "nice"; Temperature = "cold"; PlayTennis = true } { Weather = "humid"; Temperature = "hot"; PlayTennis = false } ] printfn "Given input list:" dataset |> List.iter (printfn "%A") printfn "ID3 split resulted in:" let id3Result = determineBranch dataset printID3Result 0 id3Result 
+6
source share

You can use List.partition instead of two calls to List.choose.

http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.Core/Microsoft.FSharp.Collections.List.html

(or now http://msdn.microsoft.com/en-us/library/ee353738(VS.100).aspx )

It is not clear to me that you will buy a lot of template comparisons here; the type of input (list of lists) and processing (separation and purity checking) are not really amenable to this.

And of course, when you finally get the โ€œendโ€ (clean list), you need to create a tree, and then, presumably, this function will create a leaf when the input has only one โ€œsideโ€ and โ€œcleanโ€, but create a Node from left and right side results for each other input. May be. I did not quite understand the algorithm completely.

Hope this helps you do some work. It may be useful to compose several smaller inputs and outputs of the samples to help sort out various cases of body function.

+5
source share

Thanks Brian and Chris! I really could figure it out, and I ended up with the following. This calculates the information gain to determine the best place to divide. I'm sure there are probably better ways for me to come to this decision, especially around the selected data structures, but this is the beginning. I plan to clarify things later.

 #light open System let trainList = [ [1.;0.;0.;1.;]; [0.;1.;0.;1.;]; [0.;0.;0.;0.;]; [1.;0.;1.;0.;]; [0.;0.;0.;0.;]; [1.;1.;0.;1.;]; [0.;1.;1.;0.;]; [1.;0.;0.;1.;]; [0.;0.;0.;0.;]; [1.;0.;0.;1.;]; ] type BinaryTree = | Leaf of int | Node of int * string * BinaryTree * BinaryTree let entropyList nums = let sumOfnums = nums |> Seq.sum nums |> Seq.map (fun x -> if x=0.00 then x else (-((x/sumOfnums) * Math.Log(x/sumOfnums, 2.)))) |> Seq.sum let entropyBinaryList (dataListOfLists:list<list<float>>) = let classList = dataListOfLists |> List.map (fun x -> x.Item(x.Length - 1)) let ListOfNo = classList |> List.choose (fun x -> if x = 0. then Some(x) else None) let ListOfYes = classList |> List.choose (fun x -> if x = 1. then Some(x) else None) let numberOfYes : float = float ListOfYes.Length let numberOfNo : float = float ListOfNo.Length let ListOfNumYesAndSumNo = [numberOfYes; numberOfNo] entropyList ListOfNumYesAndSumNo let conditionalEntropy (dataListOfLists:list<list<float>>) attributeNumber = let NoAttributeList = dataListOfLists |> List.choose (fun x -> if x.Item(attributeNumber) = 0. then Some(x) else None) let YesAttributeList = dataListOfLists |> List.choose (fun x -> if x.Item(attributeNumber) = 1. then Some(x) else None) let numberOfYes : float = float YesAttributeList.Length let numberOfNo : float = float NoAttributeList.Length let noConditionalEntropy = (entropyBinaryList NoAttributeList) * (numberOfNo/(numberOfNo + numberOfYes)) let yesConditionalEntropy = (entropyBinaryList YesAttributeList) * (numberOfYes/(numberOfNo + numberOfYes)) [noConditionalEntropy; yesConditionalEntropy] let findBestSplitIndex(listOfInstances : list<list<float>>) = let IGList = [0..(listOfInstances.Item(0).Length - 2)] |> List.mapi (fun ix -> (i, (entropyBinaryList listOfInstances) - (List.sum (conditionalEntropy listOfInstances x)))) IGList |> List.maxBy snd |> fst let isListPure (listToCheck : list<list<float>>) = let splitList = listToCheck |> List.choose (fun x -> if x.Item(x.Length - 1) = 1. then Some(x) else None) if splitList.Length = listToCheck.Length then 1 else if splitList.Length = 0 then 0 else -1 let rec createTree (listToSplit : list<list<float>>) = let pureCheck = isListPure listToSplit if pureCheck = 0 then printfn "%s" "Pure - Leaf(0)" else if pureCheck = 1 then printfn "%s" "Pure - Leaf(1)" else printfn "%A - is not pure" listToSplit if listToSplit.Length > 1 then // There are attributes we can split on // Chose best place to split list let splitIndex = findBestSplitIndex(listToSplit) printfn "spliting at index %A" splitIndex let leftSideSplit = listToSplit |> List.choose (fun x -> if x.Item(splitIndex) = 1. then Some(x) else None) let rightSideSplit = listToSplit |> List.choose (fun x -> if x.Item(splitIndex) = 0. then Some(x) else None) createTree leftSideSplit createTree rightSideSplit else printfn "%s" "Not Pure, but can't split choose based on heuristics - Leaf(0 or 1)" 
+1
source share

All Articles