Python: DISTINCT in the GQuery result set (GQL, GAE)

Imagine that you received an object in the Google App Engine datastore while maintaining links for anonymous users. You want to execute the following SQL query, which is not supported:

SELECT DISTINCT user_hash FROM links 

Instead, you can use:

 user = db.GqlQuery("SELECT user_hash FROM links") 

How to use Python most effectively to filter the result, so it returns a DISTINCT result set? How to count a set of results DISTINCT?

+7
python sql google-app-engine distinct gql
source share
4 answers

A set is a good way to handle this:

 >>> a = ['google.com', 'livejournal.com', 'livejournal.com', 'google.com', 'stackoverflow.com'] >>> b = set(a) >>> b set(['livejournal.com', 'google.com', 'stackoverflow.com']) >>> 

One of the w / r / t suggestions is the first answer: that sets and dicts better extract unique results quickly, list memberships are O (n) compared to O (1) for other types, so if you want to store additional data or doing something like creating the specified unique_results list, it might be better to do something like:

 unique_results = {} >>> for item in a: unique_results[item] = '' >>> unique_results {'livejournal.com': '', 'google.com': '', 'stackoverflow.com': ''} 
+3
source share

Reviving this question to complete:

The DISTINCT keyword was introduced in release 1.7.4 .

You can find the updated GQL link (e.g. for Python) here .

+5
source share

One option is to put the results in a given object:

http://www.python.org/doc/2.6/library/sets.html#sets.Set

The result set will consist only of the individual values ​​passed to it.

Otherwise, a new list will be created containing only unique objects. Something like:

 unique_results = [] for obj in user: if obj not in unique_results: unique_results.append(obj) 

This for loop can also be compiled into a list comprehension.

+1
source share

Sorry to parse this question, but in GAE I cannot compare such objects, I have to use .key () to compare:

Beware, this is very inefficient:

 def unique_result(array): urk={} #unique results with key for c in array: if c.key() not in urwk: urk[str(c.key())]=c return urk.values() 

If anyone has a better solution, share it.

0
source share

All Articles