I usually use scala -arm for resource management ( AutoClosable , Closeable , etc.) for understanding for such tasks.
Most scala tutorials use for { s <- Source.fromFile(...).getLines() } , but this is a good way to leak resources, since the source will not be automatically closed.
With scala -arm, it looks like this:
import resource._ for { source <- managed(Source.fromFile(...)) target <- managed(Files.newBufferedWriter(...)) } { for { rawLine <- source.getLines line = rawLine.trim() if !rawLine.startsWith("#") (url, html) <- parseString(line) json <- toJsonOpt(html) } { // actual action target.write(s"$url\t$json\n") } }
If you need a more sophisticated pipeline, you can use scalaz-stream, strom, spark or another library to determine the actual DAG pipeline and start executing it.
source share