Run extended MongoDB queries in R using rmongodb

While MySQL turns me on, I'm trying to get to know my first NoSQL DBMS, and that turned out to be MongoDB . I connect to it through rmongodb .

The more I play with rmongodb , the more questions / problems arise when running advanced queries.

First, I present some examples of data before I go into detail about the various types of queries that I cannot correctly specify.

Data examples

The example is taken from the MongoDB website and is a bit simplified.

pkg <- "rmongodb" if (!require(pkg, character.only=TRUE)) { install.packages(pkg) require(pkg, character.only=TRUE) } # Connect to DB db <- "test" ns <- "posts" mongo <- mongo.create(db=db) # Insert document to collection 'test.users' b <- mongo.bson.from.list(list( "_id"="alex", name=list(first="Alex", last="Benisson"), karma=1.0, age=30, test=c("a", "b") )) mongo.insert(mongo, "test.users", b) # Insert document to collection 'test.posts' b <- mongo.bson.from.list(list( "_id"="abcd", when=mongo.timestamp.create(strptime("2011-09-19 02:00:00", "%Y-%m-%d %H:%M:%s"), increment=1), author="alex", title="Some title", text="Some text.", tags=c("tag.1", "tag.2"), votes=5, voters=c("jane", "joe", "spencer", "phyllis", "li"), comments=list( list( who="jane", when=mongo.timestamp.create(strptime("2011-09-19 04:00:00", "%Y-%m-%d %H:%M:%s"), increment=1), comment="Some comment." ), list( who="meghan", when=mongo.timestamp.create(strptime("2011-09-20 13:00:00", "%Y-%m-%d %H:%M:%s"), increment=1), comment="Some comment." ) ) ) ) b mongo.insert(mongo, "test.posts", b) 

Two questions related to inserting JSON / BSON objects:

  • Test.posts document, voters field: is it correct to use c() in this case?
  • Test.posts document, comments field: what is the right way to indicate this, c() or list() ?

Top level queries: they work with cure

Top level queries work very well:

 # Get all posts by 'alex' (only titles) res <- mongo.find(mongo, "test.posts", query=list(author="alex"), fields=list(title=1L)) out <- NULL while (mongo.cursor.next(res)) out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res)))) > out [[1]] _id title "abcd" "No Free Lunch" 

Question 1: Basic sub-level queries

How can I run simple “level-level queries” (as opposed to top-level queries) that should fall into the arbitrarily deep JSON / BSON sublevels of the MongoDB object style? These sub level queries use MongoDB dot notation , and I can't figure out how to map this to a valid rmongodb request

In simple MongoDB syntax, something like

 > db.posts.find( { comments.who : "meghan" } ) 

will work. But I can not figure out how to do this with rmongodb functions

Here is what I have tried so far

 # Get all comments by 'meghan' from 'test.posts' #-------------------- # Approach 1) #-------------------- res <- mongo.find(mongo, "test.posts", query=list(comments=list(who="meghan"))) out <- NULL while (mongo.cursor.next(res)) out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res)))) > out NULL # Does not work #-------------------- # Approach 2) #-------------------- buf <- mongo.bson.buffer.create() mongo.bson.buffer.start.object(buf, "comments") mongo.bson.buffer.append(buf, "who", "meghan") mongo.bson.buffer.finish.object(buf) query <- mongo.bson.from.buffer(buf) res <- mongo.find(mongo, "test.posts", query=query) out <- NULL while (mongo.cursor.next(res)) out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res)))) > out NULL # Does not work 

Question 2: Queries with $ Operators

These works

Request 1

 buf <- mongo.bson.buffer.create() mongo.bson.buffer.start.object(buf, "age") mongo.bson.buffer.append(buf, "$lte", 30) mongo.bson.buffer.finish.object(buf) criteria <- mongo.bson.from.buffer(buf) criteria > mongo.find.one(mongo, "test.users", query=criteria) _id : 2 alex name : 3 first : 2 Alex last : 2 Benisson karma : 1 1.000000 age : 1 30.000000 test : 4 0 : 2 a 1 : 2 b 

Request 2

 buf <- mongo.bson.buffer.create() mongo.bson.buffer.start.object(buf, "test") mongo.bson.buffer.append(buf, "$in", c("a", "z")) mongo.bson.buffer.finish.object(buf) criteria <- mongo.bson.from.buffer(buf) criteria mongo.find.one(mongo, "test.users", query=criteria) 

However, note that atomic dialing will result in a NULL return value

 mongo.bson.buffer.append(buf, "$in", "a") # Instead of 'mongo.bson.buffer.append(buf, "$in", c("a", "z"))' 

An attempt of the same with sub level queries is again lost

 buf <- mongo.bson.buffer.create() mongo.bson.buffer.start.object(buf, "name") mongo.bson.buffer.start.object(buf, "first") mongo.bson.buffer.append(buf, "$in", c("Alex", "Horst")) mongo.bson.buffer.finish.object(buf) mongo.bson.buffer.finish.object(buf) criteria <- mongo.bson.from.buffer(buf) criteria <- mongo.bson.from.buffer(buf) > criteria name : 3 first : 3 $in : 4 0 : 2 Alex 1 : 2 Horst > mongo.find.one(mongo, "test.users", query=criteria) NULL 
+5
source share
3 answers

Either c () or list () may be fine. Depends on whether the components are specified and whether all of them are of the same type (for the list). It’s best to look at the generated BSON and see if you get what you want. For better control over the created object, use mongo.bson.buffer and the functions that work on it. This is actually why subqueries fail. 'comments' is created as a subobject, not an array. mongo.bson.from.list () is convenient, but it does not give you the same control, and sometimes it mistakenly assumes what to generate from complex structures.

The request for another data set can be adjusted as follows:

 buf <- mongo.bson.buffer.create() mongo.bson.buffer.start.object(buf, "name.first") mongo.bson.buffer.append(buf, "$in", c("Alex", "Horst")) mongo.bson.buffer.finish.object(buf) criteria <- mongo.bson.from.buffer(buf) 

Note that you definitely need to use a buffer here, since R will choke on a dotted name.

I hope this straightens your problem. Let me know if you have any further questions.

+7
source

I still don’t really understand how best to progress here on SO after the question has been published, but you want to talk a little more about it, perhaps by adding additional questions and answers.

As I was often told not to blow up my original question with future changes, in this “answer” I just take Gerald Lindsley's suggestions and try to put it in real code (because it still does not work for me):

Training

 pkg <- "rmongodb" if (!require(pkg, character.only=TRUE)) { install.packages(pkg) require(pkg, character.only=TRUE) } # Connect to DB db <- "test" ns <- "posts" mongo <- mongo.create(db=db) # Make sure we start with an empty collection mongo.drop(mongo, paste(db, ns, sep=".")) 

Insert document

As Gerald noted in his answer, mongo.bson.from.list() sometimes makes the wrong guesses about the resulting BSON structure, so I tried to explicitly create the BSON array objects:

 buf <- mongo.bson.buffer.create() # 'REGULAR' APPENDING mongo.bson.buffer.append(buf, "_id", "abcd") mongo.bson.buffer.append(buf, "when", mongo.timestamp.create(strptime("2011-09-19 02:00:00", "%Y-%m-%d %H:%M:%s"), increment=1)) mongo.bson.buffer.append(buf, "author", "alex") mongo.bson.buffer.append(buf, "title", "Some title") mongo.bson.buffer.append(buf, "text", "Some text.") mongo.bson.buffer.append(buf, "tags", c("tag.1", "tag.2")) mongo.bson.buffer.append(buf, "votes", 5) # / # VOTERS ARRAY mongo.bson.buffer.start.array(buf, "voters") voters <- c("jane", "joe", "spencer", "phyllis", "li") i=1 for (i in seq(along=voters)) { mongo.bson.buffer.append(buf, as.character(i), voters[i]) } mongo.bson.buffer.finish.object(buf) # / # COMMENTS ARRAY mongo.bson.buffer.start.array(buf, "comments") mongo.bson.buffer.start.object(buf, "1") mongo.bson.buffer.append(buf, "who", "jane") mongo.bson.buffer.append(buf, "when", mongo.timestamp.create(strptime("2011-09-19 04:00:00", "%Y-%m-%d %H:%M:%s"), increment=1)) mongo.bson.buffer.append(buf, "comment", "some comment.") mongo.bson.buffer.finish.object(buf) mongo.bson.buffer.start.object(buf, "2") mongo.bson.buffer.append(buf, "who", "meghan") mongo.bson.buffer.append(buf, "when", mongo.timestamp.create(strptime("2011-09-20 13:00:00", "%Y-%m-%d %H:%M:%s"), increment=1)) mongo.bson.buffer.append(buf, "comment", "some comment.") mongo.bson.buffer.finish.object(buf) # / # FINALIZE mongo.bson.buffer.finish.object(buf) b <- mongo.bson.from.buffer(buf) > b _id : 2 abcd when : 17 i: 1, t: 1316390400 author : 2 alex title : 2 Some title text : 2 Some text. tags : 4 0 : 2 tag.1 1 : 2 tag.2 votes : 1 5.000000 voters : 4 1 : 2 jane 2 : 2 joe 3 : 2 spencer 4 : 2 phyllis 5 : 2 li comments : 4 1 : 3 who : 2 jane when : 17 i: 1, t: 1316397600 comment : 2 some comment. 2 : 3 who : 2 meghan when : 17 i: 1, t: 1316516400 comment : 2 some comment. mongo.insert(mongo, "test.posts", b) 

Basic query below

 # Get all comments by 'meghan' from 'test.posts' #-------------------- # Approach 1) #-------------------- res <- mongo.find(mongo, "test.posts", query=list(comments=list(who="meghan"))) out <- NULL while (mongo.cursor.next(res)) out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res)))) > out NULL # Does not work #-------------------- # Approach 2) #-------------------- buf <- mongo.bson.buffer.create() mongo.bson.buffer.start.object(buf, "comments") mongo.bson.buffer.append(buf, "who", "meghan") mongo.bson.buffer.finish.object(buf) query <- mongo.bson.from.buffer(buf) res <- mongo.find(mongo, "test.posts", query=query) out <- NULL while (mongo.cursor.next(res)) out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res)))) > out NULL # Does not work 

I still have to do something wrong when specifying a document; -)

+2
source

Regarding atomic queries and the $ in operator, I received a request from the first question to work as follows:

 buf <- mongo.bson.buffer.create() mongo.bson.buffer.start.object(buf, "test") mongo.bson.buffer.start.array(buf, "$in") mongo.bson.buffer.append(buf, "a", "a") mongo.bson.buffer.finish.object(buf) mongo.bson.buffer.finish.object(buf) criteria <- mongo.bson.from.buffer(buf) criteria 

I assume that explicit beginning and ending of the array does the trick if the array contains only one element.

One thing that might be useful is to control the mogod console or log (after running mongod with the -v option). Running the old query, you will see:

 Tue Nov 20 16:09:04 [conn23] User Assertion: 12580:invalid query Tue Nov 20 16:09:04 [conn23] assertion 12580 invalid query ns:test.users query:{ test: { $in: "a" } } Tue Nov 20 16:09:04 [conn23] problem detected during query over test.users : { $err: "invalid query", code: 12580 } Tue Nov 20 16:09:04 [conn23] query test.users query: { test: { $in: "a" } } ntoreturn:0 keyUpdates:0 exception: invalid query code:12580 locks(micros) r:440 reslen:59 0ms 

Running a modified request looks fine:

 Tue Nov 20 16:10:14 [conn23] query test.users query: { test: { $in: [ "a" ] } } ntoreturn:0 keyUpdates:0 locks(micros) r:168 nreturned:1 reslen:142 0ms 
+2
source

All Articles