Pythonic way to aggregate object properties in memory in an efficient way?

For example, we have a large list of such objects:

class KeyStatisticEntry: def __init__(self, value=""): self.usedBytes = len(value) self.encoding = get_string_encoding(value) @property def total(self): overhead = get_object_overhead(self.usedBytes) if self.encoding == 'some value': return overhead else: return self.usedBytes + overhead @property def aligned(self): return some_func_with(self.usedBytes) # Here is lots of calculated properties on basis of existing properties 

And we need to aggregate a lot of metrics about this obejct value - min, max, sum, mean, stdev of its properties. I am currently doing this with code as follows:

 used_bytes = [] total_bytes = [] aligned_bytes = [] encodings = [] for obj in keys.items(): used_bytes.append(obj.usedBytes) total_bytes.append(obj.total) aligned_bytes.append(obj.aligned) encodings.append(obj.encoding) total_elements = len(used_bytes) used_user = sum(used_bytes) used_real = sum(total_bytes) aligned = sum(aligned_bytes) mean = statistics.mean(used_bytes) 

Question:

Is there a more "pythonic" way with better memory usage and usage?

+7
python list aggregate
source share
2 answers

You can use operator.attrgetter to get multiple attributes of your objects, and then use itertools.zip_longest ( itertools.izip_longest in Python 2.X) to have itertools.izip_longest relative attributes together.

 from operator import attrgetter all_result = [attrgetter('usedBytes','total','aligned','encoding')(obj) for obj in keys.items()] 

Or use a generator expression to create a generator instead of a list:

 all_result = (attrgetter('usedBytes','total','aligned','encoding')(obj) for obj in keys.items()) 

Then use zip_longest :

 used_bytes, total_bytes, aligned_bytes, encodings = zip_longest(*all_results) 

Then use the map function to apply the sum function to iterations for which you need the sum:

 used_user, used_real, aligned = map(sum,(used_bytes, total_bytes, aligned_bytes)) 

And separately for len and mean

 total_elements = len(used_bytes) mean = statistics.mean(used_bytes) 

And if you want to treat all the sublists as a generator (which is more optimized in terms of memory usage and reduces performance in terms of runtime), you can use the new class to calculate the desired result separately using generators:

 from itertools import tee class Aggregator: def __init__(self, all_obj): self.obj = all_obj self.used_user, self.mean = self.getTotalBytesAndMean() self.total_elements = len(self.all_obj) self.aligned = self.getAligned() def getTotalBytesAndMean(self): iter_1, iter_2 = tee((obj.usedBytes for obj in self.all_obj)) return sum(iter_1), statistics.mean(iter_2) def getTotal(self): return sum(obj.total for obj in self.all_obj) def getAligned(self): return sum(obj.aligned for obj in self.all_obj) def getEncoding(self): return (obj.encoding for obj in self.all_obj) 

Then you can do:

 Agg = Aggregator(keys.items()) # And simply access to attributes Agg.used_user 
+1
source share

There is probably a better way to use memory using (implicit) generators instead of lists to get all your information. I'm not sure what would be better if you do a lot of calculations in one list (e.g. for usedBytes). Note, however, that you cannot use len in a generator (but the length will be equal to the length of your input list):

 total_elements = len(keys.items()) used_user = sum(obj.usedBytes for obj in keys.items()) used_real = sum(obj.total for obj in keys.items()) aligned = sum(obj.aligned for obj in keys.items()) mean = statistics.mean(obj.usedBytes for obj in keys.items()) 
0
source share

All Articles