How to specify metadata for dask.dataframe

Documents provide good examples of how metadata can be provided . However, I still feel insecure when it comes to choosing the right types for my frame.

  • Can I do something like meta={'x': int 'y': float, 'z': float} instead of meta={'x': 'i8', 'y': 'f8', 'z': 'f8'} ?
  • Can someone hint me at a list of possible values, such as "i8"? What are the types of dtypes?
  • How can I specify a column containing arbitrary objects? How can I specify a column containing only instances of the same class?
+5
source share
2 answers

The basic data types available are those offered through numpy. See the documentation for a list.

This set does not include datetime-formats (e.g. datetime64 ), for which additional information can be found in pandas and numpy .

The meta argument for dask frames usually assumes that pandas are empty data frame definitions for columns, indexes, and types.

One way to build such a DataFrame:

 import pandas as pd import numpy as np meta = pd.DataFrame(columns=['a', 'b', 'c']) meta.a = meta.a.astype(np.int64) meta.b = meta.b.astype(np.datetime64) 

There is also a way to provide dtype to the pandas frame constructor, however I'm not sure how to provide them for individual columns each. As you can see, you can provide not only the β€œname” for the data types, but also the actual numpy dtype type.

As for your last question, the data type you are looking for is an β€œobject”. For instance:

 import pandas as pd class Foo: def __init__(self, foo): self.bar = foo df = pd.DataFrame(data=[Foo(1), Foo(2)], columns=['a'], dtype='object') df.a # 0 <__main__.Foo object at 0x00000000058AC550> # 1 <__main__.Foo object at 0x00000000058AC358> 
+3
source

Both Dask.dataframe and Pandas parameters use NumPy types. In particular, something in this section can go to np.dtype . This includes the following:

  • NumPy dtype objects such as np.float64
  • Python type objects such as float
  • NumPy dtype strings, e.g. 'f8'

Here's a more extensive list, taken from NumPy docs: http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html#specifying-and-constructing-data-types

+2
source

All Articles