Is it possible to use read_csv to read only certain lines?

Question

Is it possible to use read_csv to read only certain lines?

I have a csv file that looks like this:

TEST 2012-05-01 00:00:00.203 ON 1 2012-05-01 00:00:11.203 OFF 0 2012-05-01 00:00:22.203 ON 1 2012-05-01 00:00:33.203 OFF 0 2012-05-01 00:00:44.203 OFF 0 TEST 2012-05-02 00:00:00.203 OFF 0 2012-05-02 00:00:11.203 OFF 0 2012-05-02 00:00:22.203 OFF 0 2012-05-02 00:00:33.203 OFF 0 2012-05-02 00:00:44.203 ON 1 2012-05-02 00:00:55.203 OFF 0

and cannot get rid of the string "TEST" .

Is it possible to check whether a line starts with a date and read only those that do?

+7

python pandas csv

user1412286 May 23 '12 at 9:53

source share

4 answers

When you get row from csv.reader , and when you can be sure that the first element is a string, you can use

 if not row[0].startswith('TEST'): process(row)

+3

pepr May 23 '12 at 10:10

source share

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html?highlight=read_csv#pandas.io.parsers.read_csv

skiprows: list-like or integer Line numbers to skip (0-indexed) or number of lines to skip (int)

Skip [0, 6] to skip lines with "TEST".

+2

Maxim Egorushkin May 23 '12 at 10:17

source share

Another option, since I just ran into this problem:

 import pandas as pd import subprocess grep = subprocess.check_output(['grep', '-n', '^TITLE', filename]).splitlines() bad_lines = [int(s[:s.index(':')]) - 1 for s in grep] df = pd.read_csv(filename, skiprows=bad_lines)

It is less portable than @eumiro (read: it may not work on Windows) and requires reading the file twice, but has the advantage that you do not need to store all the contents of the file in memory.

Of course, you could do the same thing as grep in Python, but it will probably be slower.

0

Dougal Apr 9 '13 at 19:49

source share

eumiro · Accepted Answer · 2012-05-23T10:23:48+0000

 from cStringIO import StringIO import pandas s = StringIO() with open('file.csv') as f: for line in f: if not line.startswith('TEST'): s.write(line) s.seek(0) # "rewind" to the beginning of the StringIO object pandas.read_csv(s) # with further parameters…

Is it possible to use read_csv to read only certain lines?

More articles: