How can I normalize data in a range of columns in my pandas data frame

Suppose I have an overview of the pandas frame data:

I want to normalize the data in each column by doing:

surveyData_norm = (surveyData - surveyData.mean()) / (surveyData.max() - surveyData.min()) 

This will work fine if my data table contains only those columns that I wanted to normalize. However, I have several columns containing string data preceding the following:

 Name State Gender Age Income Height Sam CA M 13 10000 70 Bob AZ M 21 25000 55 Tom FL M 30 100000 45 

I only want to normalize the Age, Income, and Height columns, but my method above does not work due to the string data in the status and gender columns.

+8
python pandas
source share
2 answers

You can perform operations on a subset of rows or columns in pandas in several ways. One useful way is indexing:

 # Assuming same lines from your example cols_to_norm = ['Age','Height'] survey_data[cols_to_norm] = survey_data[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min())) 

This will apply it only to the columns you want, and assign the result back to these columns. Alternatively, you can set them to new, normalized columns and save the originals if you want.

.....

+15
source share

A simple method and method are more effective:
Pre-calculate the average value:
dropna() avoid missing data.

 mean_age = survey_data.Age.dropna().mean() max_age = survey_data.Age.dropna().max() min_age = survey_data.Age.dropna().min() dataframe['Age'] = dataframe['Age'].apply(lambda x: (x - mean_age ) / (max_age -min_age )) 

this method will work ...

+2
source share

All Articles