Understanding the list works, but not for the loop - why?

Question

Understanding the list works, but not for the loop - why?

I am a little annoyed with myself because I can’t understand why one solution to the problem worked and another did not. As with the case, this indicates a lack of understanding of the (main) pandas on my part, and it makes me crazy!

In any case, my problem was simple: I had a list of "bad" values ("bad_index"); they corresponded to the row indexes on the data frame ('data_clean1'), for which I wanted to delete the corresponding rows. However, since the values will change with each new data set, I would not want to connect bad values directly to the code. Here is what I did first:

bad_index = [2, 7, 8, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 29] for i in bad_index: dataclean2 = dataclean1.drop([i]).reset_index(level = 0, drop = True)

But that did not work; data_clean2 remained the same as data_clean1. My second idea was to use lists (as shown below); it worked out perfectly.

 bad_index = [2, 7, 8, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 29] data_clean2 = data_clean1.drop([x for x in bad_index]).reset_index(level = 0, drop = True)

Now, why does the list comprehension method work, rather than the 'for' loop? I have been coding for several months and I feel that I should not make such mistakes.

Thanks!

+5

python pandas for-loop indexing list-comprehension

Lodore66 Aug 19 '16 at 17:05

source share

2 answers

EDIT: it turns out that this is not your problem ... but if you didn’t have the problem mentioned in another Deepspace answer, you would have this problem

 for i in bad_index: dataclean2 = dataclean1.drop([i]).reset_index(level = 0, drop = True)

imagine your bad index [1,2,3] and your data file [4,5,6,7,8]

now allows you to go through what is actually happening

initial: dataclean == [4,5,6,7,8]

loop0: i == 1 => index drop 1 ==> dataclean = [4,6,7,8]

loop1: i == 2 => index drop 2 ==> dataclean = [4,6,8]

loop2: i == 3 ==> drop index 3 !!!! uh oh no index 3

you might suggest that instead

 for i in reversed(bad_index): ...

thus, if index3 is removed first, it will not affect indexes 1 and 2

but in general you should not mutate the / dict list when you iterate over it

+1

Joran beasley Aug 19 '16 at 17:12

source share

Deep space · Accepted Answer · 2016-08-19T17:13:29+0000

data_clean1.drop([x for x in bad_index]).reset_index(level = 0, drop = True) equivalent to simply switching the bad_index list to drop :

data_clean1.drop(bad_index).reset_index(level = 0, drop = True)

drop takes a list and discards every index present in the list.

The explicit for loop did not work, because at each iteration you simply dropped another index from dataclean1 dataframe without saving intermediate data frames, so at the last iteration dataclean2 was just the result of running <w> dataclean2 = dataclean1.drop(29).reset_index(level = 0, drop = True)

Understanding the list works, but not for the loop - why?

More articles: