Is it a good idea to store metadata in a pandas DataFrame column label?

I discussed the question of whether there should be a dedicated space for storing MetaData in the pandas DataFrame, and I personally would find this feature very useful.

Before this is implemented in a future version, I consider two workarounds, but none of them look really satisfactory.

Since I cannot inherit a class from a DataFrame, I tried to create a MyDataFrame class containing a DataFrame and implement all the methods __add__, __mul __... However, this seems like a tedious approach, given the number of methods available for manipulating a DataFrame.

In addition, the data that I manage comes from physical equipment (spectrum analyzers, oscilloscopes ...), and basically I want to have a set of metadata (measurement bandwidth, number of averages ...) associated with each dataframe column. The structure that needs to be encoded to maintain a one-to-one correspondence with the DataFrame structure looks complex (what if the DataFrame gets Transposed?)

I found an elegant solution for using custom objects instead of the usual Series object names. These "MetaIndex" are basically row + metadata and replace the regular column labels in my_dataframe.columns. The class definition looks something like this:

class MetaIndex: def __hash__(self): return self.str.__hash__() def __eq__(self,other): return self.str == str(other) def __init__(self,st): self.str = format_name(st) self._meta = MetaData() def __repr__(self): return self.str def __str__(self): return self.str 

Then, when I save the DataFrame (in hdf5 file), I change MetaIndex to a regular row and save each column metadata separately, and when I load the data, I re-create the data column with something like:

 s.replace_names_by_meta_index() for c in s.columns: col = s[c] if col.meta is not None: col.meta.set(**f["meta"][str(c)]) 

1) Could you say that this is a reasonable way, or is it a marginal and dangerous approach?

2) In addition, a very tempting extension is to inherit MetaIndex from str. Thus, autocomplete still works with IPython (my_dataframe.col1). However, in this case, some simple operations, such as my_dataframe.TT, cause MetaIndex to lose all the metadata contained in index._meta (as if index._meta = dict () was restored behind the scenes operation). it looks very mystical to me, which is why I am worried about the whole approach.

Are there any clues about what is going on there? Thanks in advance, Best regards, Samuel

+4
source share

All Articles