Pandas DataFrame column to list

Question

Pandas DataFrame column to list

I am taking a subset of the data from a column based on the conditions in another column.

I can return the correct values, but it is in pandas.core.frame.DataFrame. How to convert this to a list?

import pandas as pd tst = pd.read_csv('C:\\SomeCSV.csv') lookupValue = tst['SomeCol'] == "SomeValue" ID = tst[lookupValue][['SomeCol']] #How To convert ID to a list

+110

python pandas

user3646105 May 20, '14 at 0:00

source share

4 answers

Akavall · Answer 1 · 2014-05-20 00:09

Use .values to get numpy.array and then .tolist() to get a list.

For example:

 import pandas as pd df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9], 'b':[3,5,6,2,4,6,7,8,7,8,9]})

Result:

 >>> df['a'].values.tolist() [1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9]

or you can just use

 >>> df['a'].tolist() [1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9]

To remove duplicates, you can do one of the following:

 >>> df['a'].drop_duplicates().values.tolist() [1, 3, 5, 7, 4, 6, 8, 9] >>> list(set(df['a'])) # as pointed out by EdChum [1, 3, 4, 5, 6, 7, 8, 9]

MarredCheese · Answer 2 · 2017-02-16 18:08

I would like to clarify a few things:

As the other answers pointed out, the easiest way is to use pandas.Series.tolist() . I am not sure why the main voting answer leads to using pandas.Series.values.tolist() , since as far as I can tell, it adds syntax / confusion without any additional benefits.
tst[lookupValue][['SomeCol']] is a data frame (as indicated in the question), not a series (as indicated in the comment on the question). This is due to the fact that tst[lookupValue] is a data framework, and slicing it with [['SomeCol']] asks for a list of columns (a list that has a length of 1), which returns a data frame. if you remove the extra set of brackets, as in tst[lookupValue]['SomeCol'] , then you only ask for this column, not the list of columns, and this way you get the series back.
You need a series to use pandas.Series.tolist() , so you should definitely skip the second set of brackets in this case. FYI, if you ever end up with a single column of data frame, which is easy to avoid for example, you can use pandas.DataFrame.squeeze() to convert it to series.
tst[lookupValue]['SomeCol'] gets a subset of a specific column through chain slicing. It slices once to get a data frame with certain rows on the left, and then slices again to get a specific column. You can get away with it here, since you are just reading, not writing, but the correct way to do this is tst.loc[lookupValue, 'SomeCol'] (which returns the series).
Using the syntax from # 4, you can do everything in one line: ID = tst.loc[tst['SomeCol'] == 'SomeValue', 'SomeCol'].tolist()

Demo code:

 import pandas as pd df = pd.DataFrame({'colA':[1,2,1], 'colB':[4,5,6]}) filter_value = 1 print "df" print df print type(df) rows_to_keep = df['colA'] == filter_value print "\ndf['colA'] == filter_value" print rows_to_keep print type(rows_to_keep) result = df[rows_to_keep]['colB'] print "\ndf[rows_to_keep]['colB']" print result print type(result) result = df[rows_to_keep][['colB']] print "\ndf[rows_to_keep][['colB']]" print result print type(result) result = df[rows_to_keep][['colB']].squeeze() print "\ndf[rows_to_keep][['colB']].squeeze()" print result print type(result) result = df.loc[rows_to_keep, 'colB'] print "\ndf.loc[rows_to_keep, 'colB']" print result print type(result) result = df.loc[df['colA'] == filter_value, 'colB'] print "\ndf.loc[df['colA'] == filter_value, 'colB']" print result print type(result) ID = df.loc[rows_to_keep, 'colB'].tolist() print "\ndf.loc[rows_to_keep, 'colB'].tolist()" print ID print type(ID) ID = df.loc[df['colA'] == filter_value, 'colB'].tolist() print "\ndf.loc[df['colA'] == filter_value, 'colB'].tolist()" print ID print type(ID)

Result:

 df colA colB 0 1 4 1 2 5 2 1 6 <class 'pandas.core.frame.DataFrame'> df['colA'] == filter_value 0 True 1 False 2 True Name: colA, dtype: bool <class 'pandas.core.series.Series'> df[rows_to_keep]['colB'] 0 4 2 6 Name: colB, dtype: int64 <class 'pandas.core.series.Series'> df[rows_to_keep][['colB']] colB 0 4 2 6 <class 'pandas.core.frame.DataFrame'> df[rows_to_keep][['colB']].squeeze() 0 4 2 6 Name: colB, dtype: int64 <class 'pandas.core.series.Series'> df.loc[rows_to_keep, 'colB'] 0 4 2 6 Name: colB, dtype: int64 <class 'pandas.core.series.Series'> df.loc[df['colA'] == filter_value, 'colB'] 0 4 2 6 Name: colB, dtype: int64 <class 'pandas.core.series.Series'> df.loc[rows_to_keep, 'colB'].tolist() [4, 6] <type 'list'> df.loc[df['colA'] == filter_value, 'colB'].tolist() [4, 6] <type 'list'>

zhql0907 · Answer 3 · 2016-08-20 11:57

You can use pandas.Series.tolist

eg:.

 import pandas as pd df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})

Run:

 >>> df['a'].tolist()

You'll get

 >>> [1, 2, 3]

ShikharDua · Answer 4 · 2016-04-21 22:10

The above solution is good if all the data is of the same type. Mass arrays are homogeneous containers. When you execute df.values , the output is a numpy array . Therefore, if the data has int and float , then the output will either have int or float , and the columns will lose their original dtype type. Consider df

 ab 0 1 4 1 2 5 2 3 6 a float64 b int64

So, if you want to keep the original dtype, you can do something like

 row_list = df.to_csv(None, header=False, index=False).split('\n')

this will return each row as a string.

 ['1.0,4', '2.0,5', '3.0,6', '']

Then split each line to get a list of the list. Each item after splitting is Unicode. We need to convert the required data type.

 def f(row_str): row_list = row_str.split(',') return [float(row_list[0]), int(row_list[1])] df_list_of_list = map(f, row_list[:-1]) [[1.0, 4], [2.0, 5], [3.0, 6]]

Pandas DataFrame column to list

More articles: