I have a problem reading the CSV (or TXT file) in the pandas module. Since the numpy loadtxt function takes too much time, I decided to use pandas read_csv instead.
I want to make an array from a txt file with four columns separated by a space, and have a very large number of lines (for example, 256 ^ 3. In this example, this is 64 ^ 3).
The problem is that I donβt know why, but it seems that pandas read_csv always skips the first line (first line) of the csv (txt) file, resulting in one-time less data.
here is the code
from __future__ import division import numpy as np import pandas as pd ngridx = 4 ngridy = 4 ngridz = 4 size = ngridx*ngridy*ngridz f = np.zeros((size,4)) a = np.arange(size) f[:, 0] = np.floor_divide(a, ngridy*ngridz) f[:, 1] = np.fmod(np.floor_divide(a, ngridz), ngridy) f[:, 2] = np.fmod(a, ngridz) f[:, 3] = np.random.rand(size) print f[0] np.savetxt('Testarray.txt',f,fmt='%6.16f') g = pd.read_csv('Testarray.txt',delimiter=' ').values print g[0] print len(g[:,3])
f [0] and g [0], which are displayed in the output, should match, but this is not the case, which indicates that pandas skip the first line of Testarray.txt . Also, the length of the downloaded file g less than the length of the array f .
I need help.
Thanks in advance.
python numpy pandas load
Tom
source share