Get column index from column name in python pandas

In R, when you need to get a column index based on the column name you could do

idx <- which(names(my_data)==my_colum_name) 

Is there a way to do the same with pandas dataframes?

+114
python pandas indexing dataframe
Oct 22
source share
5 answers

Of course you can use .get_loc() :

 In [45]: df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]}) In [46]: df.columns Out[46]: Index([apple, orange, pear], dtype=object) In [47]: df.columns.get_loc("pear") Out[47]: 2 

although, frankly, I don’t often need it myself. Usually access by name does what I want ( df["pear"] , df[["apple", "orange"]] , or perhaps df.columns.isin(["orange", "pear"]) ), although I can definitely see cases where you need an index number.

+182
Oct 23
source share

The DSM solution works, but if you want to get the direct equivalent of which , you can do (df.columns == name).nonzero()

+13
Oct 23 '12 at 18:27
source share

Here is a solution through list comprehension. cols - a list of columns for which you want to get the index:

 [df.columns.get_loc(c) for c in cols if c in df] 
+13
Sep 09 '17 at 8:20
source share

If you want to find multiple column matches, you can use a vectorized solution using the searchsorted method. Thus, with df as the data frame and query_cols as the names of the columns to look for, the implementation will be -

 def column_index(df, query_cols): cols = df.columns.values sidx = np.argsort(cols) return sidx[np.searchsorted(cols,query_cols,sorter=sidx)] 

Run Example -

 In [162]: df Out[162]: apple banana pear orange peach 0 8 3 4 4 2 1 4 4 3 0 1 2 1 2 6 8 1 In [163]: column_index(df, ['peach', 'banana', 'apple']) Out[163]: array([4, 1, 0]) 
+5
Jul 20 '16 at 19:37
source share

If you need a column name from a column location (on the contrary, to the OP question), you can use:

 >>> df.columns.get_values()[location] 

Using the @DSM example:

 >>> df = DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]}) >>> df.columns Index(['apple', 'orange', 'pear'], dtype='object') >>> df.columns.get_values()[1] 'orange' 

Differently:

 df.iloc[:,1].name 
+5
Mar 02 '18 at 11:35
source share



All Articles