There are several possible ways to approach this when writing code in your favorite language of choice using the appropriate APIs to select data, transform it, and then paste it into MongoDB.
You can also do this using the SQL query language, MongoDB, and the shell. One simple way is to select flat data through SQL, upload it to a CSV file, import it into MongoDB and use the aggregation structure to convert it to the desired format.
If you are fortunate enough to use a database that supports arrays or other ways of grouping strings into separate list types, you can make one choice and include it in the JSON or MongoDB insert statement.
In these examples, I assume that you need a format equivalent to the document for each city:
{ State:"Orissa", City:{ Name:"Phulbani", Area:[ "Phulbani","Phulbani2","Pokali","Madira" ] } }
Examples of data in the DBMS:
asya=
A simple way with arrays:
SELECT 'db.cities.insert({ state:"' || states.name || '", city: { name: "' || cities.name || '", areas : [ ' || array_to_string(array_agg('"' || cityarea.name || '"'),',') || ']}});' FROM states JOIN cities ON (states.id=cities.states_id) LEFT OUTER JOIN cityarea ON (cities.id=cityarea.city_id) GROUP BY states.name, cities.name;
gives you a result that can go directly to the MongoDB shell:
db.cities.insert({ state:"California", city: { name: "Los Angeles", areas : [ "Financial District","Fashion District","Skid Row","Bunker Hill"]}}); db.cities.insert({ state:"California", city: { name: "San Diego", areas : [ ]}}); db.cities.insert({ state:"California", city: { name: "San Francisco", areas : [ "Haight Ashbury","South Beach","SoMa","Cole Valley"]}}); db.cities.insert({ state:"Massachusetts", city: { name: "Boston", areas : [ "Beacon Hill","Brookline","Backbay"]}}); db.cities.insert({ state:"New York", city: { name: "Brooklyn", areas : [ "Park Slope"]}}); db.cities.insert({ state:"New York", city: { name: "Buffalo", areas : [ ]}}); db.cities.insert({ state:"New York", city: { name: "New York", areas : [ "Little Italy","West Village","Harlem","SOHO"]}});
The longer way, if you do not have support for array or list types, is to select the combined data:
asya=# SELECT states.name as state, cities.name as city, cityarea.name as area FROM states JOIN cities ON (states.id=cities.states_id) LEFT OUTER JOIN cityarea ON (cities.id=cityarea.city_id); state | city | area ---------------+---------------+-------------------- California | Los Angeles | Financial District California | Los Angeles | Fashion District California | Los Angeles | Skid Row California | Los Angeles | Bunker Hill California | San Francisco | Cole Valley California | San Francisco | Haight Ashbury California | San Francisco | South Beach California | San Francisco | SoMa California | San Diego | New York | New York | West Village New York | New York | Harlem New York | New York | SOHO New York | New York | Little Italy New York | Brooklyn | Park Slope New York | Buffalo | Massachusetts | Boston | Brookline Massachusetts | Boston | Backbay Massachusetts | Boston | Beacon Hill (18 rows)
I used the left outer join in cityarea because in my model data I had a city without any areas, but I wanted to get all pairs of state, cities, even if there was no area for this.
You can download this interactively or via the command line (use the appropriate syntax for your RDBMS). I will do it interactively:
asya=# \a Output format is unaligned. asya=# \f Field separator is "|". asya=# \f , Field separator is ",". asya=# \t Showing only tuples. asya=# \o dump.txt asya=# SELECT states.name as state, cities.name as city, cityarea.name as area FROM states JOIN cities ON (states.id=cities.states_id) LEFT OUTER JOIN cityarea ON (cities.id=cityarea.city_id); asya=# \q
Now I have a comma-separated file with state, city and region in the form of three fields. I can load it into MongoDB using the mongoimport utility:
asya$ mongoimport -d sample -c tmpcities --type csv --fields state,city,area < dump.txt connected to: 127.0.0.1 2014-08-05T07:41:36.744-0700 check 9 18 2014-08-05T07:41:36.744-0700 imported 18 objects
Now, to convert to the format I want, I use aggregation:
mongo sample MongoDB shell version: 2.6.4 connecting to: sample1 > db.tmpcities.aggregate( {$group:{_id:"$city", state:{$first:"$state"}, areas:{$push:"$area"}}}, {$project:{state:1,_id:0,city:{name:"$_id", areas:"$areas"}}}, {$out:'cities'}) > db.cities.find({},{_id:0}) { "_id" : "Boston", "state" : "Massachusetts", "areas" : [ "Brookline", "Backbay", "Beacon Hill" ] } { "_id" : "New York", "state" : "New York", "areas" : [ "West Village", "Harlem", "SOHO", "Little Italy" ] } { "_id" : "Buffalo", "state" : "New York", "areas" : [ "" ] } { "_id" : "Brooklyn", "state" : "New York", "areas" : [ "Park Slope" ] } { "_id" : "San Diego", "state" : "California", "areas" : [ "" ] } { "_id" : "San Francisco", "state" : "California", "areas" : [ "Cole Valley", "Haight Ashbury", "South Beach", "SoMa" ] } { "_id" : "Los Angeles", "state" : "California", "areas" : [ "Financial District", "Fashion District", "Skid Row", "Bunker Hill" ] }