I am a little annoyed with myself because I can’t understand why one solution to the problem worked and another did not. As with the case, this indicates a lack of understanding of the (main) pandas on my part, and it makes me crazy!
In any case, my problem was simple: I had a list of "bad" values ("bad_index"); they corresponded to the row indexes on the data frame ('data_clean1'), for which I wanted to delete the corresponding rows. However, since the values will change with each new data set, I would not want to connect bad values directly to the code. Here is what I did first:
bad_index = [2, 7, 8, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 29] for i in bad_index: dataclean2 = dataclean1.drop([i]).reset_index(level = 0, drop = True)
But that did not work; data_clean2 remained the same as data_clean1. My second idea was to use lists (as shown below); it worked out perfectly.
bad_index = [2, 7, 8, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 29] data_clean2 = data_clean1.drop([x for x in bad_index]).reset_index(level = 0, drop = True)
Now, why does the list comprehension method work, rather than the 'for' loop? I have been coding for several months and I feel that I should not make such mistakes.
Thanks!