How to read weird csv files in Pandas?

Question

How to read weird csv files in Pandas?

I would like to read an example csv file shown below

-------------- |A|B|C| -------------- |1|2|3| -------------- |4|5|6| -------------- |7|8|9| --------------

I tried

 pd.read_csv("sample.csv",sep="|")

But that did not work.

How can I read this csv?

+5

python pandas csv

Heisenberg Sep 13 '16 at 5:49

source share

3 answers

Try import csv instead of using pandas directly.

 import csv easy_csv = [] with open('sample.csv', 'rb') as csvfile: test = csv.reader(csvfile, delimiter=' ', quotechar='|') for row in test: row_preprocessed = """ handling rows at here; removing |, ignoring row that has ----""" easy_csv.append([row_preprocessed])

After this preprocessing, you can save it in comma-separated csv files to easily process on pandas.

+1

Jonghokim Sep 13 '16 at 5:55

source share

I try this code and its ok !:

 import pandas as pd import numpy as np a = pd.read_csv("a.csv",sep="|") print(a) for i in a: print(i)

0

Mikail land Sep 13 '16 at 5:59

source share

jezrael · Accepted Answer · 2016-09-13T05:52:07+0000

You can add a comment parameter to read_csv and then delete the columns using NaN dropna :

 import pandas as pd import io temp=u"""-------------- |A|B|C| -------------- |1|2|3| -------------- |4|5|6| -------------- |7|8|9| --------------""" #after testing replace io.StringIO(temp) to filename df = pd.read_csv(io.StringIO(temp), sep="|", comment='-').dropna(axis=1, how='all') print (df) ABC 0 1 2 3 1 4 5 6 2 7 8 9

More general solution:

 import pandas as pd import io temp=u"""-------------- |A|B|C| -------------- |1|2|3| -------------- |4|5|6| -------------- |7|8|9| --------------""" #after testing replace io.StringIO(temp) to filename #separator is char which is NOT in csv df = pd.read_csv(io.StringIO(temp), sep="^", comment='-') #remove first and last | in data and in column names df.iloc[:,0] = df.iloc[:,0].str.strip('|') df.columns = df.columns.str.strip('|') #split column names cols = df.columns.str.split('|')[0] #split data df = df.iloc[:,0].str.split('|', expand=True) df.columns = cols print (df) ABC 0 1 2 3 1 4 5 6 2 7 8 9

How to read weird csv files in Pandas?

More articles: