Select specific CSV columns (Filtering) - Python / pandas

I have a very large 100-column CSV file. To illustrate my problem, I will use a very simple example.

Suppose we have a CSV file.

in value df 0 975 f01 5 1 976 F 4 2 977 d4 1 3 978 B6 0 4 979 2C 0 

I want to select specific columns.

 import pandas data = pandas.read_csv("ThisFile.csv") 

To select the first 2 columns, I used

 data.ix[:,:2] 

To select different columns, such as 2nd and 4th. What should I do?

There is another way to solve this problem by overwriting the CSV file. But this is a huge file; Therefore, I avoid this.

+10
python pandas csv
source share
3 answers

This selects the second and fourth columns (since Python uses 0-based indexing):

 In [272]: df.iloc[:,(1,3)] Out[272]: value f 0 975 5 1 976 4 2 977 1 3 978 0 4 979 0 [5 rows x 2 columns] 

df.ix can choose a location or label. df.iloc always chooses a location. When indexing by location, use df.iloc to more clearly indicate your intention. It is also slightly faster since Pandas does not need to check if your index uses tags.


Another possibility is to use the usecols parameter:

 data = pandas.read_csv("ThisFile.csv", usecols=[1,3]) 

This will load only the second and fourth columns in the data DataFrame.

+13
source share

If you prefer to use columns by name, you can use

 data[['value','f']] value f 0 975 5 1 976 4 2 977 1 3 978 0 4 979 0 
+6
source share

As Wai Yip Tung said, you can filter your data frame while reading by specifying the column name, for example:

 import pandas as pd data = pd.read_csv("ThisFile.csv")[['value','d']] 

This solved my problem.

0
source share

All Articles