I would like to specify dtypes return types when doing pandas.read_sql. In particular, I'm interested in saving memory and having float values ββreturned as np.float32, not np.float64. I know that I can convert subsequently using astype (np.float32), but this does not solve the problem of large memory requirements in the original request. In my actual code, I will pull 84 million lines, not the 5 shown here. pandas.read_csv allows you to specify dtypes types as dict, but I see no way to do this with read_sql.
I am using MySQLdb and Python 2.7.
As an aside, read_sql seems to use a lot more memory at runtime (about 2x) than is needed for the final DataFrame storage.
In [70]: df=pd.read_sql('select ARP, ACP from train where seq < 5', connection) In [71]: df Out[71]: ARP ACP 0 1.17915 1.42595 1 1.10578 1.21369 2 1.35629 1.12693 3 1.56740 1.61847 4 1.28060 1.05935 In [72]: df.dtypes Out[72]: ARP float64 ACP float64 dtype: object
source share