Choose the many-to-many method in the App Engine DB database

App Engine will feature "articles" and "tags."

And there are two ways to implement this (thanks to Nick Johnson's article ):

# one entity just refers others class Article(db.Model): tags = db.ListProperty(Tag) # via separate "join" table class ArticlesAndTags(db.Model): article = db.ReferenceProperty(Article) tag = db.ReferenceProperty(Tag) 

Which one should I use in accordance with the following tasks?

  • Create a tag cloud (often),
  • Select articles by tag (rarely)
+4
source share
4 answers

Due to the lack of an abbreviation function on the appengine map (or an SQL group for a similar query), tag clouds are difficult to implement efficiently because you need to count all the tags that you have manually. Which implementation you are going with, I would suggest having a separate TagCounter model for the tag cloud that tracks how many tags you have. Otherwise, the tag request can become expensive if you have a lot of them.

 class TagCounter: tag = db.ReferenceProperty(Tag) counter = db.IntegerProperty(default=0) 

Each time you decide to update your tags in an article, make sure that you increase and decrease this table accordingly.

Regarding the selection of articles by tag, the first implementation is sufficient (the second is overly complex imo).

 class Article(db.Model): tags = db.ListProperty(Tag) @staticmethod def select_by_tag(tag): return Article.all().filter("tags", tag).run() 
+2
source

I created a huge tag cloud * on GAEcupboard , selecting the first solution:

 class Post(db.Model): title = db.StringProperty(required = True) tags = db.ListProperty(str, required = True) 

The tag class has a counter property, which is updated every time a new message is created or updated.

 class Tag(db.Model): name = db.StringProperty(required = True) counter = db.IntegerProperty(required = True) last_modified = db.DateTimeProperty(required = True, auto_now = True) 

The presence of tags organized in ListProperty is quite simple to offer a drill-down function that allows the user to create different tags to search for the necessary articles:

Example: http://www.gaecupboard.com/tag/python/web-frameworks

The search is performed using:

 posts = Post.all() posts.filter('tags', 'python').filter('tags', 'web-frameworks') posts.fetch() 

which doesn't need any custom index at all.

ok this is too cumbersome i know :)

+2
source

Creating a tag cloud in the engine application is really difficult, because the data warehouse does not support the GROUP BY construct, which is usually used to express it; It also does not provide a way to sort by the length of a list property.

One of the key considerations is that you need to often show a tag cloud, but you do not need to create it unless there are new articles or articles get retagged since you will get the same clout tag anyway; In fact, the tag cloud does not change much for each new article, perhaps the tag in the cloud becomes a little big or a little smaller, but not much, but not in a way that will affect its usefulness.

This suggests that tag clouds should be created periodically, cached, and displayed just like static content. You should think about this in the task queue API.

Another query listing articles by tag will be completely unconfirmed by the first technician you showed; Inverting, having a tag model with articles ListProperty does support the request, but will suffer from update competition when popular tags need to be added to it at a high speed. Another method that uses the association model does not suffer from any of these problems, but complicates the convenience of adding queries to the list of articles.

The way I will deal with this is to start with the ArticlesAndTags model, but add some extra data to the model in order to have a useful order; the date of the article, the name of the article, which makes sense for the specific type of site you are creating. You will also need a monotonous sequence (e.g. timestamp) so you know when the tag was applied.

The tag cloud request will be supported using a tag object that has only a numerical article counter, as well as a link to the same time stamp used in the ArticlesAndTags model.

Then the task queue can request the 1000 oldest ArticleAndTags, which are newer than the oldest tag, summarize the frequencies of each of them and add them to the counts in the tags. Deleting tags is probably rare enough that they can immediately update the tag model without too much disagreement, but if this assumption turns out to be wrong, then delete the events in ArticleAndTags as well.

+1
source

You do not seem to have very specific / complex requirements, therefore, in my opinion, none of the methods would show significant advantages, or rather, the pros and cons completely depend on what you are used to, structure your code and how You implement caching and counting mechanisms.

What comes to mind for me:

- The ListProperty method leaves data models more natural.

- The AtriclesAndTags method will mean that you will need to request a relationship and then Articles ( ugh .. ) instead of doing Article.all().filter('tags =', tag) .

0
source

All Articles