Specifying dtypes for read_sql in pandas

Question

Specifying dtypes for read_sql in pandas

I would like to specify dtypes return types when doing pandas.read_sql. In particular, I'm interested in saving memory and having float values returned as np.float32, not np.float64. I know that I can convert subsequently using astype (np.float32), but this does not solve the problem of large memory requirements in the original request. In my actual code, I will pull 84 million lines, not the 5 shown here. pandas.read_csv allows you to specify dtypes types as dict, but I see no way to do this with read_sql.

I am using MySQLdb and Python 2.7.

As an aside, read_sql seems to use a lot more memory at runtime (about 2x) than is needed for the final DataFrame storage.

In [70]: df=pd.read_sql('select ARP, ACP from train where seq < 5', connection) In [71]: df Out[71]: ARP ACP 0 1.17915 1.42595 1 1.10578 1.21369 2 1.35629 1.12693 3 1.56740 1.61847 4 1.28060 1.05935 In [72]: df.dtypes Out[72]: ARP float64 ACP float64 dtype: object

+7

python-2.7 pandas mysql-python

Solverworld Aug 17 '16 at 15:16

source share

2 answers

Dylan · Answer 1 · 2017-11-06T12:24:23+0000

What about cast () and convert ()?

 'SELECT cast(ARP as float32()), cast (ACP as float32()) from train where seq < 5'

or something similar.

http://www.smallsql.de/doc/sql-functions/system/convert.html

mooli · Answer 2 · 2019-04-30T15:23:27+0000

Take a look in this github release, it looks like they tend to add an option.

Specifying dtypes for read_sql in pandas

More articles: