Specifying dtypes for read_sql in pandas

I would like to specify dtypes return types when doing pandas.read_sql. In particular, I'm interested in saving memory and having float values ​​returned as np.float32, not np.float64. I know that I can convert subsequently using astype (np.float32), but this does not solve the problem of large memory requirements in the original request. In my actual code, I will pull 84 million lines, not the 5 shown here. pandas.read_csv allows you to specify dtypes types as dict, but I see no way to do this with read_sql.

I am using MySQLdb and Python 2.7.

As an aside, read_sql seems to use a lot more memory at runtime (about 2x) than is needed for the final DataFrame storage.

In [70]: df=pd.read_sql('select ARP, ACP from train where seq < 5', connection) In [71]: df Out[71]: ARP ACP 0 1.17915 1.42595 1 1.10578 1.21369 2 1.35629 1.12693 3 1.56740 1.61847 4 1.28060 1.05935 In [72]: df.dtypes Out[72]: ARP float64 ACP float64 dtype: object 
+7
source share
2 answers

What about cast () and convert ()?

 'SELECT cast(ARP as float32()), cast (ACP as float32()) from train where seq < 5' 

or something similar.

http://www.smallsql.de/doc/sql-functions/system/convert.html

+1
source

Take a look in this github release, it looks like they tend to add an option.

0
source

All Articles