Effective NoSQL Modeling in the Google App Engine Data Warehouse

Question

Effective NoSQL Modeling in the Google App Engine Data Warehouse

I am writing an application in the Google App Engine to help me better understand it. I save my data in a data warehouse.

The application is similar to StackOverflow models: you have a Story object that has a collection of Comment objects, which, in turn, can be liked / hated by many users. Now I'm modeling this right now:

class Story { Comment[] comments; ... } class Comment { User[] likes; User[] hates; ... }

So, when you download a given story, you can list all the comments, as well as the percentage of likes and dislikes for each comment. You can also track whether a given user voted for a comment or not.

I suppose I can be lazy to load all the actual users into the Comment entity, but even then I seem to understand that there is a better way to do this.

How will this handle a story with hundreds of comments, each containing hundreds of thousands of votes ?!

What is the general way to model such a concept in NoSQL?

+7

google-app-engine nosql data-modeling google-cloud-datastore

rodrigo-silveira Jan 6 '13 at 1:53

source share

1 answer

ryan1234 · Accepted Answer · 2013-01-06T03:53:50+0000

Possible answers:

(1) How will this handle hundreds of comments?

You seem to have already answered this by offering you a lazy loading of comments in the user interface. I know that document databases, such as Mongo and CouchDB, provide you with the ability to output data from a database. Things like limit and skip.

Hundreds of comments should not be too complicated to store, and I would not have thought that they would be slow in the request.

(2) How to handle hundreds of thousands of votes?

I think the best way is to simply pre-process this. When the user votes for something, you can think about performing two operations: 1) Increase the comment, for example, by a counter. 2) Write a recording of user votes elsewhere.

The first step will be very quick and easy, and it will immediately show users the total number of likes.

The second operation (saving what the user did - a comment that they liked / did not like) may be a little slower, but you can easily do it.

It is important to remember that with NoSQL we are not worried about data normalization, so redundant information is fine!

(3) What is the general way to model these concepts?

As I mentioned from (2) - and in my experience - a good way of modeling is to quickly build up elements and store redundant information.

This is especially useful for storing data many times in different documents, because it is very difficult to make things join things like Mongo and Couch. It’s best to keep this information close to the entity that it needs.

Another quality of NoSQL databases is that they are allowed to be inconsistent. It’s normal for a comment, like / dislike, to be considered one number in the comments section and another number when looking at what the user liked / disliked.

(The only note about your model that can be intimidating is entity separation. Always remember, if you decompose things - as you would in a traditional RDMS - you will have to join them later! It can be very difficult with NoSQL.)

Effective NoSQL Modeling in the Google App Engine Data Warehouse

More articles: