How to perform an operation once only at the end of scalding?

I read in scalding groupAlldocs:

   /**
    * Group all tuples down to one reducer.
    * (due to cascading limitation).
    * This is probably only useful just before setting a tail such as Database
    * tail, so that only one reducer talks to the DB.  Kind of a hack.
    */
    def groupAll: Pipe = groupAll { _.pass }

This gave me good reason to believe that if pipemy end writeleads to the fact that in the pipe statusUpdater, which just updates some database, that my task is completed successfully, it will be executed once after completion of work, however I tried it in

The following code example:

import Dsl._
somepipe
  .addCount
  .toPipe(outputSchema)
  .write(Tsv(outputPath, outputSchema, writeHeader = true))(flowDef, mode)
  .groupAll.updateResultStatus

  implicit class StatusResultsUpdater(pipe: Pipe) {
    def updateResultStatus: Pipe = {
      println("DO THIS ONCE AFTER JOB COMPLETES!") // was printed even before the job ended! how to have it print only when job ends!?
      pipe
    }
  }

according to the documents, when I used it groupAll, then it updateResultStatusshould start only after the end of the work and only once, why do I see that it prints the expression before the end of the work? Am I missing something? What should I do to make it work?

+4
1

Scalding :

  • Job ( Pipes, Taps ..).
  • . .
  • . "Hadoop jobs" "" .
  • .

println 1.

+4

All Articles