The pythonic equivalent of this function?

Question

The pythonic equivalent of this function?

I have a function to transfer from another language, could you help me make this "pythonic"?

Here the function is ported in a "non-Python" way (this is a slightly artificial example - each task is associated with a project or "No", we need a list of individual projects, which means there is no duplication of the .identifier property, starting from the task list):

@staticmethod def get_projects_of_tasks(task_list): projects = [] project_identifiers_seen = {} for task in task_list: project = task.project if project is None: continue project_identifier = project.identifier if project_identifiers_seen.has_key(project_identifier): continue project_identifiers_seen[project_identifier] = True projects.append(project) return projects

I didn’t even start making it “pythonic” so as not to start with the wrong foot (for example, understanding the list using “if project.identifier is not a nickname”, filter () based on a predicate that searches for a dictionary based on the identifier registry using set () for marking up duplicates, etc.)

EDIT:

Based on the feedback, I have the following:

 @staticmethod def get_projects_of_tasks(task_list): projects = [] project_identifiers_seen = set() for task in task_list: project = task.project if project is None: continue project_identifier = project.identifier if project_identifier in project_identifiers_seen: continue project_identifiers_seen.add(project_identifier) projects.append(project) return projects

+4

python

Robottinosino May 30 '12 at 16:08

source share

5 answers

 def get_projects_of_tasks(task_list): seen = set() return [seen.add(task.project.identifier) or task.project #add is always None for task in task_list if task.project is not None and task.project.identifier not in seen]

This works because (a) add returns None (and or returns the value of the last evaluated expression) and (b) the matching clause (first clause) is only executed if the if clause is True .

There is no reason to believe that this should be in the understanding of the list - you could just set it as a loop, and indeed, you may prefer. The advantage of this method is that it is clear that you are just building a list and what should be in it.

I did not use staticmethod because it is rarely required. Either it is as a function of the module level, or a classmethod .

An alternative is a generator (thanks to @delnan for pointing this out):

 def get_projects_of_tasks(task_list): seen = set() for task in task_list: if task.project is not None and task.project.identifier not in seen: identifier = task.project.identifier seen.add(identifier) yield task.project

This eliminates the need for a side effect of understanding (which is inconsistent), but continues clearly what is going on.

To avoid another if / continue construct, I left two calls to task.project.identifier . This could be easily fixed with a promise library.

This version uses promises to avoid re-accessing task.project.identifier without having to include if / continue:

 from peak.util.proxies import LazyProxy, get_cache # pip install ProxyTypes def get_projects_of_tasks(task_list): seen = set() for task in task_list: identifier = LazyProxy(lambda:task.project.identifier) # a transparent promise if task.project is not None and identifier not in seen: seen.add(identifier) yield task.project

This is safe from AttributeErrors attributes because task.project.identifier never opens before task.project is checked.

+3

Marcin May 30 '12 at 16:18

source share

What about:

 project_list = {task.project.identifier:task.project for task in task_list if task.project is not None} return project_list.values()

Instead of 2.6, use the dict constructor:

 return dict((x.project.id, x.project) for x in task_list if x.project).values()

+3

Charles Brunet May 30 '12 at 16:21

source share

Some say that EAFP is pythonic, therefore:

 @staticmethod def get_projects_of_tasks(task_list): projects = {} for task in task_list: try: if not task.project.identifier in projects: projects[task.project.identifier] = task.project except AttributeError: pass return projects.values()

An explicit course check would also not be wrong, and, of course, it would be better if many tasks were not performed.

And just one dictate is enough to track seen identifiers and projects, it would be enough if the order of the projects matters, then OrderedDict (python2.7 +) can come in handy.

+1

mata May 30 '12 at 16:30

source share

There are already many good answers, and indeed you have accepted them! But I thought I would add another option. Many people have seen that your code could be made more compact with generator expressions or lists. I am going to propose a hybrid style that uses generator expressions for initial filtering, while preserving the for loop in the last filter.

The advantage of this style over the style of your source code is that it simplifies the flow of control by eliminating continue . The advantage of this style in understanding a single list is that it avoids multiple access to task.project.identifier natural way. It also handles the volatile state (set seen ) transparently, which, in my opinion, is important.

 def get_projects_of_tasks(task_list): projects = (task.project for task in task_list) ids_projects = ((p.identifier, p) for p in projects if p is not None) seen = set() unique_projects = [] for id, p in ids_projects: if id not in seen: seen.add(id) unique_projects.append(p) return unique_projects

Since these are generator expressions (enclosed in brackets instead of brackets), they do not create temporary lists. The first expression of the generator creates the iterability of the projects; you might think of it as executing the line project = task.project from your source code in all projects at once. The second generator expression creates the iterability of the tuples (project_id, project) . The if clause at the end filters out None values; (p.identifier, p) is evaluated only if p passes through the filter. Together, these two generator expressions eliminate your first two if blocks. The remaining code is essentially the same as yours.

Also note the great suggestion from Marcin / delnan that you create a generator with yield . This reduces the additional detail of your code, reducing it to the point:

 def get_projects_of_tasks(task_list): projects = (task.project for task in task_list) ids_projects = ((p.identifier, p) for p in projects if p is not None) seen = set() for id, p in ids_projects: if id not in seen: seen.add(id) yield p

The only drawback - in case this is not obvious - is that if you want to permanently store projects, you must pass the result to list .

 projects_of_tasks = list(get_projects_of_tasks(task_list))

+1

senderle May 30 '12 at 21:31

source share

Daniel Roseman · Accepted Answer · 2012-05-30T16:16:52+0000

There is nothing massive unPythonic in this code. Several possible improvements:

project_identifiers_seen can be a collection, not a dictionary.
foo.has_key(bar) better written by bar in foo
I suspect this is a staticmethod class. There is usually no need for a Python class unless you actually do data encapsulation. If this is just a normal feature, make it modular.

The pythonic equivalent of this function?

More articles: