How to specify dtype when using pandas.read_csv to load data from csv files?

I have text files with the following format:

000423|东阿阿胶| 300|1|0.15000| | 000425|徐工机械| 600|1|0.15000| | 000503|海虹控股| 400|1|0.15000| | 000522|白云山A| |2| | 1982.080| 000527|美的电器| 900|1|0.15000| | 000528|柳 工| 300|1|0.15000| | 

when I use read_csv to load them into a DataFrame, it does not generate the correct dtype type for some columns. For example, the first column is parsed as int, not unicode str, the third column is parsed as unicode str, not int, due to lack of data ... Is there a way to pre-set dtype DataFrame, like numpy.genfromtxt does?

Update: I used read_csv like this, which caused the problem:

 data = pandas.read_csv(StringIO(etf_info), sep='|', skiprows=14, index_col=0, skip_footer=1, names=['ticker', 'name', 'vol', 'sign', 'ratio', 'cash', 'price'], encoding='gbk') 

To solve problems with dtype and encoding, I first need to use unicode() and numpy.genfromtxt :

 etf_info = unicode(urllib2.urlopen(etf_url).read(), 'gbk') nd_data = np.genfromtxt(StringIO(etf_info), delimiter='|', skiprows=14, skip_footer=1, dtype=ETF_DTYPE) data = pandas.DataFrame(nd_data, index=nd_data['ticker'], columns=['name', 'vol', 'sign', 'ratio', 'cash', 'price']) 

It would be nice if read_csv can add the dtype and usecols . Sorry for my greed. ^ _ ^

+7
source share
2 answers

Simply put: no, not yet. More work is needed in this area (read: more active developers). If you can post how you use read_csv , this might help. I suspect space between columns may be a problem

EDIT: This is deprecated. This behavior extends to read_csv

+4
source

Now you can use dtype in read_csv .

PS: Kudos to Wes McKinney for the answer, it’s embarrassing for him to contradict Wes’s past.

+1
source

All Articles