I have a DataFrame that comes from the df.groupby().size() operation and looks like this:
Localization RNA level cytoplasm 1 Non-expressed 7 2 Very low 13 3 Low 8 4 Medium 6 5 Moderate 8 6 High 2 7 Very high 6 cytoplasm & nucleus 1 Non-expressed 5 2 Very low 8 3 Low 2 4 Medium 10 5 Moderate 16 6 High 6 7 Very high 5 cytoplasm & nucleus & plasma membrane 1 Non-expressed 6 2 Very low 3 3 Low 3 4 Medium 7 5 Moderate 8 6 High 4 7 Very high 1
What I want to do is compute the individual occurrences (i.e. the last column coming from .size() ) as a percentage of the total number of occurrences in the applicable Localization .
For example: in the cytoplasm location there are only 50 cases (7 + 13 + 8 + 6 + 8 + 2 + 6), which gives 14 and 26% for the Non-expressed and Very low RNA levels, respectively.
Is there a good way to do this? I went around this with the fact that, in my opinion, itβs very cool, i.e. creating a new DataFrame for each Localization and working from there, but there are a lot of rows and the problem of merging all the resulting DataFrames at the end. I hope there will be a smarter way to do this, at least!
python pandas bioinformatics
Sajber
source share