Actually it is much faster to use csv lib and str.replace:
import csv with open("test.txt") as f: next(f) # itertools.imap python2 df = pd.DataFrame.from_records(csv.reader(map(lambda x: x.rstrip().replace("|", ";"), f), delimiter=";"), columns=["Column 1", "Column 2", "ID", "Age", "Height"]).astype(int)
Some timings:
In [35]: %%timeit pd.read_csv("test.txt", sep="[;|]", engine='python', skiprows=1, names=["Column 1", "Column 2", "ID", "Age", "Height"]) ....: 100 loops, best of 3: 14.7 ms per loop In [36]: %%timeit with open("test.txt") as f: next(f) df = pd.DataFrame.from_records(csv.reader(map(lambda x: x.rstrip().replace("|", ";"), f),delimiter=";"), columns=["Column 1", "Column 2", "ID", "Age", "Height"]).astype(int) ....: 100 loops, best of 3: 6.05 ms per loop
You can just str.split:
with open("test.txt") as f: next(f) df = pd.DataFrame.from_records(map(lambda x: x.rstrip().replace("|", ";").split(";"), f), columns=["Column 1", "Column 2", "ID", "Age", "Height"])
Padraic cunningham
source share