I just finished watching Martin Odersky’s 6th week in a Scala lecture on Coursera. In lecture 5, he says that
"... translation for is not limited to lists or sequences or even collections;
It is based solely on the availability of the map, flatMap, and withFilter methods .
This allows you to use the for syntax for your own types - you should only define map, flatMap and withFilter for these types. "
The problem I'm trying to solve is that we have a batch process that loads data from multiple databases, combines data and somehow exports the results. The data is small enough to fit in memory (a couple of 100,000 records from each source system), but large enough to make it important to think about performance.
I could use a traditional in-memory database (e.g. H2) and access it through ScalaQuery or something similar, but what I really need is just a way to efficiently search and combine data from different source systems - to SQL and JOIN indexes. It is very inconvenient to use Scala's full-scale relational database + ORM for something that can be easily and efficiently solved using some data structure that is native to Scala.
My first naive approach is a vector data structure (for quick direct access) combined with one or more “indexes” (which can be implemented as B-trees, as in database systems). The Map, flatMap, withFilter methods of this combined data structure can be smart enough to use the index if they have one for the requested field (s), or they can have a “hint” for using it.
I'm just wondering if such data structures exist and are they available, or do I need to implement them myself? Is there a library or collection structure for Scala that solves this problem?
egbokul
source share