Python Pandas not reading first line of csv file

I have a problem reading the CSV (or TXT file) in the pandas module. Since the numpy loadtxt function takes too much time, I decided to use pandas read_csv instead.

I want to make an array from a txt file with four columns separated by a space, and have a very large number of lines (for example, 256 ^ 3. In this example, this is 64 ^ 3).

The problem is that I don’t know why, but it seems that pandas read_csv always skips the first line (first line) of the csv (txt) file, resulting in one-time less data.

here is the code

from __future__ import division import numpy as np import pandas as pd ngridx = 4 ngridy = 4 ngridz = 4 size = ngridx*ngridy*ngridz f = np.zeros((size,4)) a = np.arange(size) f[:, 0] = np.floor_divide(a, ngridy*ngridz) f[:, 1] = np.fmod(np.floor_divide(a, ngridz), ngridy) f[:, 2] = np.fmod(a, ngridz) f[:, 3] = np.random.rand(size) print f[0] np.savetxt('Testarray.txt',f,fmt='%6.16f') g = pd.read_csv('Testarray.txt',delimiter=' ').values print g[0] print len(g[:,3]) 

f [0] and g [0], which are displayed in the output, should match, but this is not the case, which indicates that pandas skip the first line of Testarray.txt . Also, the length of the downloaded file g less than the length of the array f .

I need help.

Thanks in advance.

+14
python numpy pandas load
source share
3 answers

By default, pd.read_csv uses header=0 (when the names parameter is also not specified), which means that the first row (i.e., with index 0) is interpreted as column names.

If your data does not have a header, use

 pd.read_csv(..., header=None) 

For example,

 import io import sys import pandas as pd if sys.version_info.major == 3: # Python3 StringIO = io.StringIO else: # Python2 StringIO = io.BytesIO text = '''\ 1 2 3 4 5 6 ''' print(pd.read_csv(StringIO(text), sep=' ')) 

Without header , the first row 1 2 3 sets the column names:

  1 2 3 0 4 5 6 

With header=None first line is processed as data:

 print(pd.read_csv(StringIO(text), sep=' ', header=None)) 

seal

  0 1 2 0 1 2 3 1 4 5 6 
+34
source share

If your file does not have a header line, you need to tell Pandas, so use header = None in your pd.read_csv () call.

+1
source share

Even more confusing is that numpy.loadtxt () does not accept the header when reading CSV files.

0
source share

All Articles