The author of the jug is here: the jug works great. I just tried the following and it works:
from jug import TaskGenerator import pandas as pd import numpy as np @TaskGenerator def gendata(): return pd.DataFrame(np.arange(343440).reshape((10,-1))) @TaskGenerator def compute(x): return x.mean() y = compute(gendata())
It is not as efficient as it can be, because it just uses pickle inside for the DataFrame (although it compresses it "on the fly", therefore it is not terrible in terms of memory usage, but slower than it could be).
I would be open to a change that saves them as a special case, as the pitcher currently does for numpy arrays: https://github.com/luispedro/jug/blob/master/jug/backends/file_store.py#L102
luispedro
source share