Let's say I have an outdated application, which for various reasons, according to previous developers, should have an arbitrarily flexible scheme, and they again invented the Entity-Attribute-Value model. They actually tried to create a document repository for which tools such as Mongo or Couch would now better fit the world today, but were not available or not known to previous teams.
To remain competitive, say, we need to create more powerful methods for querying and analyzing information in our system. Based on the large number and variety of attributes, it seems that map / reduce is better for our set of problems than gradually reorganizing the system into a more relational scheme.
The original source database contains millions of documents, but only a small number of different types of documents. Different types of documents have some common features.
What is an effective strategy for moving from a massive EAV implementation, say, in MySql, to document-oriented storage like Mongo or Couch?
I can, of course, imagine an approach to attacking this, but I would really like to see a textbook or military history to learn from someone who has already attacked this type of problem.
What were the strategies for such a conversion that worked well? What lessons did you learn? What pitfalls should be avoided? How did you deal with legacy applications that are still waiting to interact with an existing database?
source share