Understanding Python Lists and Other Best Practices

This refers to a project for converting a two-way ANOVA program to SAS in Python.

Mostly I started learning the language on Thursday, so I know that I have many opportunities for improvement. If I am missing something that is obviously obvious, by all means let me know. I don't have Sage yet and it doesn't work, and no, so right now, it's all pretty vanilla Python 2.6.1. (Portable)

Primary query: you need a good set of list concepts that can retrieve data in lists of samples in lists by coefficient A, coefficient B, in general, and by groups of each level of factors A & B (AxB).

After some work, the data is in the following form (3 layers of nested lists):

Reply [a] [b] [p]

(value [a1 [b1 [n1, ..., nN] ... [bB [n1, ... nN]]], ..., [aA [b1 [n1, ..., nN] .. . [bB [n1, ... nN]]] I hope this is clear.)

Factor levels in my example: A = 3 (0-2), B = 8 (0-7), N = 8 (0-7)

byA= [[a[i] for i in range(b)] for a[b] in response] 

(Can someone explain why this syntax works? I stumbled upon it, trying to figure out what the parser would take. I haven't seen this syntax related to this behavior elsewhere, but it's really nice. Any good links to sites or books on This topic will be appreciated. Editing: The constancy of variables between runs explained this oddity. This does not work.)

 byB=lstcrunch([[Bs[i] for i in range(len(Bs)) ]for Bs in response]) 

(It should be noted that zip(*response) almost does what I want. The above version does not actually work, as I recall. I have not tested it yet with thorough testing.)

 byAxB= [item for sublist in response for item in sublist] 

(Stolen from Alex Martelli's answer on this site. Can someone explain why again? List syntax syntax is not well explained in the texts I read.)

 ByO= [item for sublist in byAxB for item in sublist] 

(Obviously, I just used the previous understanding here because he did what I needed. Edit :)

I would like them to end with the same data types, at least when they are fixated on the factor under consideration, so that you can use and use the same average / total / SS / et cetera functions.

This can be easily replaced with something cleaner:

 def lstcrunch(Dlist): """Returns a list containing the entire contents of whatever is imported, reduced by one level. If a rectangular array, it reduces a dimension by one. lstcrunch(DataSet[a][b]) -> DataOutput[a] [[1, 2], [[2, 3], [2, 4]]] -> [1, 2, [2, 3], [2, 4]] """ flat=[] if islist(Dlist):#1D top level list for i in Dlist: if islist(i): flat+= i else: flat.append(i) return flat else: return [Dlist] 

Oh, if I'm in a topic, what is the preferred way to define a variable as a list? I use:

 def islist(a): "Returns 'True' if input is a list and 'False' otherwise" return type(a)==type([]) 

Newsletter Request: Is there a way to explicitly force a shallow copy to convert to depth? copy? Or, similarly, when copying to a variable, is there a way to declare that the assignment should also replace the pointer, and not just the value? (st assignment will not apply to other small copies). Likewise, using this can be useful from time to time, so being able to control when this happens or doesn't happen sounds very good. (I really pounced on myself when I prepared the table for input by calling: answer = [[[0] * N] * B] * A)

Edit : Further research leads to most of this working fine. Since then I have made a class and tested it. It works great. I left the list comprehension forms intact for reference.

 def byB(array_a_b_c): y=range(len(array_a_b_c)) x=range(len(array_a_b_c[0])) return [[array_a_b_c[i][j][k] for k in range(len(array_a_b_c[0][0])) for i in y] for j in x] def byA(array_a_b_c): return [[repn for rowB in rowA for repn in rowB] for rowA in array_a_b_c] def byAxB(array_a_b_c): return [rowB for rowA in array_a_b_c for rowB in rowA] def byO(array_a_b_c): return [rep for rowA in array_a_b_c for rowB in rowA for rep in rowB] def gen3d(row, col, inner): """Produces a 3d nested array without any naughty shallow copies. [row[col[inner]] named st the outer can be split on, per lprn for easy display""" return [[[k for k in range(inner)] for i in range(col)] for j in range(row)] def lprn(X): """This prints a list by lines. Not fancy, but works""" if isiterable(X): for line in X: print line else: print x def isiterable(a): return hasattr(a, "__iter__") 

Thanks to all who responded. Already there is a noticeable improvement in the quality of the code due to the improvement of my gnosis. Of course, further thoughts are appreciated.

+4
source share
3 answers

byAxB= [item for sublist in response for item in sublist] Again, can anyone explain why?

I am sure that A.M. will be able to give you a good explanation. Here is my blow at him, waiting for his appearance.

I would go up to this from left to right. Take these four words:

 for sublist in response 

Hope you see the resemblance to a regular for loop. These four words do the bulk of the work to perform some actions for each sublist in response . It seems like response is a list of lists. In this case, the sublist will be a list for each iteration through response .

 for item in sublist 

This is another for loop when creating. Given that we first heard about the sublist in the previous “loop”, this would mean that we now go through the sublist, one item at a time. If I wrote these loops without understanding, it would look like this:

 for sublist in response: for item in sublist: 

Then we look at the remaining words. [ , item and ] . This effectively means collecting the items in the list and returning the resulting list.

Whenever you have problems creating or understanding lists of lists, write the appropriate for tags, and then compress them:

 result = [] for sublist in response: for item in sublist: result.append(item) 

It shrinks to:

 [ item for sublist in response for item in sublist ] 

List syntax syntax is not well explained in the texts I read

Dive Into Python has a list section . There is also this nice textbook to read.

Update

I forgot to say something. The meaning of the list is another way to achieve what has traditionally been done with map and filter . It would be nice to understand how map and filter work if you want to improve your understanding of fu.

+6
source

For part of the copy, look at the copy module, python just uses the links after creating the first object, so any change in other "copies" extends back to the original, but the copy module makes real copies of the objects, and you can specify several copy modes

+1
source

Sometimes it can be difficult to get the right level of recursion in your data structure, however, I think it should be relatively simple in your case. To check this, while we are doing, we need data from one sample, for example:

 data = [ [a, [b, range(1,9)]] for b in range(8) for a in range(3)] print 'Origin' print(data) print 'Flat' ## from this we see how to produce the c data flat print([(a,b,c) for a,[b,c] in data]) print "Sum of data in third level = %f" % sum(point for point in c for a,[b,c] in data) print "Sum of all data = %f" % sum(a+b+sum(c) for a,[b,c] in data) 

for type checking, as a rule, you should avoid it, but if you need it, since when you do not want to perform recursion in a string, you can do it like

 if not isinstance(data, basestring) : .... 

If you need to smooth out the structure, you can find useful code in the Python Documentation (another way to express this is chain(*listOfLists)) and as a list understanding [ d for sublist in listOfLists for d in sublist ] :

 from itertools import flat.chain def flatten(listOfLists): "Flatten one level of nesting" return chain.from_iterable(listOfLists) 

This does not work if you have data at different depths. For heavy atlas see http://www.python.org/workshops/1994-11/flatten.py ,

+1
source

All Articles