Separate a specific column and add them as columns in CSV (Python3, CSV)

I have a csv file that has several columns, which I first separated by a colon (;). However, ONE column is limited to pipe | and I would like to delimit this column and create new columns.

Input:

Column 1 Column 2 Column 3 1 2 3|4|5 6 7 6|7|8 10 11 12|13|14 

Output Required:

  Column 1 Column 2 ID Age Height 1 2 3 4 5 6 7 6 7 8 10 11 12 13 14 

My code still limits the first time; and then converts to DF (which is my desired end format)

 delimit = list(csv.reader(open('test.csv', 'rt'), delimiter=';')) df = pd.DataFrame(delimit) 
+7
python pandas csv
source share
4 answers

You did not specify exactly what the data looks like (you say that it is limited to a semicolon, but your examples do not have it), but if it looks like

 Column 1;Column 2;Column 3 1;2;3|4|5 6;7;6|7|8 10;11;12|13|14 

You can do something like

 >>> df = pd.read_csv("test.csv", sep="[;|]", engine='python', skiprows=1, names=["Column 1", "Column 2", "ID", "Age", "Height"]) >>> df Column 1 Column 2 ID Age Height 0 1 2 3 4 5 1 6 7 6 7 8 2 10 11 12 13 14 

This works using a regex separator meaning "either ; or | " and force column names to be entered manually.

Alternatively, you can do this in a few steps:

 >>> df = pd.read_csv("test.csv", sep=";") >>> df Column 1 Column 2 Column 3 0 1 2 3|4|5 1 6 7 6|7|8 2 10 11 12|13|14 >>> c3 = df.pop("Column 3").str.split("|", expand=True) >>> c3.columns = ["ID", "Age", "Height"] >>> df.join(c3) Column 1 Column 2 ID Age Height 0 1 2 3 4 5 1 6 7 6 7 8 2 10 11 12 13 14 
+3
source share
 delimit = list(csv.reader(open('test.csv', 'rt'), delimiter=';')) for row in delimit: piped = row.pop() row.extend(piped.split('|')) df = pd.DataFrame(delimit) 

delimit as follows:

 [ ['1', '2', '3', '4', '5'], ['6', '7', '6', '7', '8'], ['10', '11', '12', '13', '14'], ] 
0
source share

Actually it is much faster to use csv lib and str.replace:

 import csv with open("test.txt") as f: next(f) # itertools.imap python2 df = pd.DataFrame.from_records(csv.reader(map(lambda x: x.rstrip().replace("|", ";"), f), delimiter=";"), columns=["Column 1", "Column 2", "ID", "Age", "Height"]).astype(int) 

Some timings:

 In [35]: %%timeit pd.read_csv("test.txt", sep="[;|]", engine='python', skiprows=1, names=["Column 1", "Column 2", "ID", "Age", "Height"]) ....: 100 loops, best of 3: 14.7 ms per loop In [36]: %%timeit with open("test.txt") as f: next(f) df = pd.DataFrame.from_records(csv.reader(map(lambda x: x.rstrip().replace("|", ";"), f),delimiter=";"), columns=["Column 1", "Column 2", "ID", "Age", "Height"]).astype(int) ....: 100 loops, best of 3: 6.05 ms per loop 

You can just str.split:

 with open("test.txt") as f: next(f) df = pd.DataFrame.from_records(map(lambda x: x.rstrip().replace("|", ";").split(";"), f), columns=["Column 1", "Column 2", "ID", "Age", "Height"]) 
0
source share

It turned out a solution for myself:

 df = pd.DataFrame(delimit) s = df['Column 3'].apply(lambda x: pd.Series(x.split('|'))) frame = pd.DataFrame(s) frame.rename(columns={0: 'ID',1:'Height',2:'Age'}, inplace=True) result = pd.concat([df, frame], axis=1) 
0
source share

All Articles