How would you put it in Haskell?

Could you use if / else to write this algorithm in Haskell? Is there any way to express this without them? It is difficult to extract functions from the middle that make sense. This is just the output of a machine learning system.

I am implementing an algorithm for classifying segments of html content as Content or Boilerplate, described here . It has weights that are already hardcoded.

curr_linkDensity <= 0.333333 | prev_linkDensity <= 0.555556 | | curr_numWords <= 16 | | | next_numWords <= 15 | | | | prev_numWords <= 4: BOILERPLATE | | | | prev_numWords > 4: CONTENT | | | next_numWords > 15: CONTENT | | curr_numWords > 16: CONTENT | prev_linkDensity > 0.555556 | | curr_numWords <= 40 | | | next_numWords <= 17: BOILERPLATE | | | next_numWords > 17: CONTENT | | curr_numWords > 40: CONTENT curr_linkDensity > 0.333333: BOILERPLATE 
+5
source share
2 answers

Since there are only three paths in this decision tree that lead to the BOILERPLATE state, I just went over and simplified them:

 isBoilerplate = prev_linkDensity <= 0.555556 && curr_numWords <= 16 && prev_numWords <= 4 || prev_linkDensity > 0.555556 && curr_numWords <= 40 && next_numWords <= 17 || curr_linkDensity > 0.333333 
+6
source

Without simplifying the logic manually (assuming you can generate this code automatically), I think using MultiWayIf pretty clean and straightforward.

 {-# LANGUAGE MultiWayIf #-} data Stats = Stats { curr_linkDensity :: Double, prev_linkDensity :: Double, ... } data Classification = Content | Boilerplate classify :: Stats -> Classification classify s = if | curr_linkDensity s <= 0.333333 -> if | prev_linkDensity s <= 0.555556 -> if | curr_numWords s <= 16 -> if | next_numWords s <= 15 -> if | prev_numWords s <= 4 -> Boilerplate | prev_numWords s > 4 -> Content | next_numWords s > 16 -> Content ... 

etc.

However, since it is so structured - just an if / else tree with comparison, we also consider creating a data structure of a decision tree and writing an interpreter for it. This will allow you to do transformations, manipulations, checks. Maybe he will buy something; Defining miniature languages ​​for your specifications can be unexpectedly beneficial.

 data DecisionTree io = Comparison (i -> Double) Double (DecisionTree io) (DecisionTree io) | Leaf o runDecisionTree :: DecisionTree io -> i -> o runDecisionTree (Comparison fv ifLess ifGreater) i | fi <= v = runDecisionTree ifLess i | otherwise = runDecisionTree ifGreater i runDecisionTree (Leaf o) = o -- DecisionTree is an encoding of a function, and you can write -- Functor, Applicative, and Monad instances! 

Then

  classifier :: DecisionTree Stats Classification classifier = Comparison curr_linkDensity 0.333333 (Comparison prev_linkDensity 0.555556 (Comparison curr_numWords 16 (Comparison next_numWords 15 (Comparison prev_numWords 4 (Leaf Boilerplate) (Leaf Content)) (Leaf Content) ... 
+11
source

All Articles