How can you approximate Python or the operator for comparison in Scala?

After listening to the last Stack Overflow podcast, Peter Norvig, the compact Pellon spell-checker intrigued me, so I decided to implement it in Scala if I could express it well in the functional Scala idiom, and also see how many lines of code it will take .

That is the whole problem. (Do not compare lines of code yet.)

(Two notes: you can run this in the Scala interpreter if you want. If you need a copy of the big.txt file or the entire project, on GitHub .)

import scala.io.Source val alphabet = "abcdefghijklmnopqrstuvwxyz" def train(text:String) = { "[az]+".r.findAllIn(text).foldLeft(Map[String, Int]() withDefaultValue 1) {(a, b) => a(b) = a(b) + 1} } val NWORDS = train(Source.fromFile("big.txt").getLines.mkString.toLowerCase) def known(words:Set[String]) = {Set.empty ++ (for(w <- words if NWORDS contains w) yield w)} def edits1(word:String) = { Set.empty ++ (for (i <- 0 until word.length) // Deletes yield (word take i) + (word drop (i + 1))) ++ (for (i <- 0 until word.length - 1) // Transposes yield (word take i) + word(i + 1) + word(i) + (word drop (i + 2))) ++ (for (i <- 0 until word.length; j <- alphabet) // Replaces yield (word take i) + j + (word drop (i+1))) ++ (for (i <- 0 until word.length; j <- alphabet) // Inserts yield (word take i) + j + (word drop i)) } def known_edits2(word:String) = {Set.empty ++ (for (e1 <- edits1(word); e2 <- edits1(e1) if NWORDS contains e2) yield e2)} def correct(word:String) = { val options = Seq(() => known(Set(word)), () => known(edits1(word)), () => known_edits2(word), () => Set(word)) val candidates = options.foldLeft(Set[String]()) {(a, b) => if (a.isEmpty) b() else a} candidates.foldLeft("") {(a, b) => if (NWORDS(a) > NWORDS(b)) a else b} } 

In particular, I am wondering if it is possible to do something cleaner using the correct function. In the original Python implementation, the implementation is a bit clean:

 def correct(word): candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word] return max(candidates, key=NWORDS.get) 

Apparently, in Python, an empty set will be evaluated as Boolean False , so only the first candidate will be evaluated, returning a non-empty set, preserving the potentially costly calls edits1 and known_edits2 .

The only solution I came up with is the version you see here, where anonymous Seq functions are called until a non-empty Set is returned, which is guaranteed to be the last.

So experienced Scala-heads, is there a more syntactically concise or better way to do this? Thanks in advance!

+6
python scala anonymous-function
source share
5 answers

Iterators are also lazy (although not very functional, since you can iterate over them only once.) So, you can do it this way:

  def correct(word: String) = { val sets = List[String => Set[String]]( x => known(Set(x)), x => known(edits1(x)), known_edits2 ).elements.map(_(word)) sets find { !_.isEmpty } match { case Some(candidates: Set[String]) => candidates.reduceLeft { (res, n) => if (NWORDS(res) > NWORDS(n)) res else n } case None => word } } 

As a bonus, the Iterator find () method does not evaluate the next element.

+2
source share

I'm not sure why you are trying to use a lazy rating for known , and not just use a stream, as illustrated by oxbow_lakes. The best way to do what he did:

 def correct(word: String) = { import Stream._ val str = cons(known(Set(word)), cons(known(edits1(word)), cons(known_edits2(word), cons(Set(word), empty)))) str find { !_.isEmpty } match { case Some(candidates) => candidates.foldLeft(Set[String]()) { (res, n) => if (NWORDS(res) > NWORDS(n)) res else n } case None => Set() } } 

It uses the fact that Stream.cons already lazy and therefore we don’t need to wrap everything in thunk.

If you're really tuned for good syntax, we can add syntactic sugar to all of these requirements:

 implicit def streamSyntax[A](tail: =>Stream[A]) = new { def #::(hd: A) = Stream.cons(hd, tail) } 

Now our previously ugly definition of str refers to the following:

 def correct(word: String) = { val str = known(Set(word)) #:: known(edits1(word)) #:: known_edits2(word) #:: Set(word) #:: Stream.empty ... } 
+6
source share

Will this work? The _ syntax is a partially applied function and using (lazy) Stream , I guarantee that the evaluations in reduceLeft (which I think is more appropriate than foldLeft here) are performed only when necessary!

 def correct(word:String) = { Stream(known(Set(word)) _, known(edits1(word)) _, known_edits2(word) _, Set(word) _ ).find( !_().isEmpty ) match { case Some(candidates) => candidates.reduceLeft {(res, n) => if (NWORDS(res) > NWORDS(n)) res else n} case _ => "" //or some other value 

}

I probably made some syntax errors here, but I think the Stream approach is valid

+4
source share

Scala 2.7-ready (including implicit performance):

 class Or[A](one: Set[A]) { def or(other: => Set[A]): Set[A] = if (one.isEmpty) other else one } implicit def toOr[A](one: Set[A]) = new Or(one) def correct(word: String) = { candidates = known(Set(word)) or known(edits1(word)) or known_edits2(word) or Set(word) candidates.foldLeft("") {(a, b) => if (NWORDS(a) > NWORDS(b)) a else b} } 

Scala 2.8-goodness:

 implicit def toOr[A](one: Set[A]) = new AnyRef { def or(other: => Set[A]): Set[A] = if (one.isEmpty) other else one } def correct(word: String) = { candidates = known(Set(word)) or known(edits1(word)) or known_edits2(word) or Set(word) candidates.max(Ordering.fromLessThan[String](NWORDS(_) < NWORDS(_))) } 

However, I pretty much supported everyone else. I did not consider a Stream .

EDIT

It seems that Ordering.fromLessThan could lead to a doubling of the necessary comparisons. Here is an alternate version for this line:

  candidates.max(new Ordering[String] { def compare(x: String, y: String) = NWORDS(x) - NWORDS(y) }) 
+3
source share

I tried to implement a short Scala spelling corrector implementation . These are 15 lines without import. The shortest replacement for Python or a simple call to the name parameter:

 def or[T](candidates : Seq[T], other : => Seq[T]) = if(candidates.isEmpty) other else candidates def candidates(word: String) = or(known(List(word)), or(known(edits1(word)), known(edits2(word)))) 

In a real-world scenario, I would use the implicit conversion proposed by Daniel.

0
source share

All Articles