Error building a DataFrame containing NaN with Pandas 0.12.0 and Matplotlib 1.3.1 in Python 3.3.2

First of all, this question does not coincide with this .

The problem I am facing is that when I try to build a DataFrame containing numpy NaN in one cell, I get an error message:

C:\>\Python33x86\python.exe Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd >>> import numpy as np >>> import matplotlib.pyplot as plt >>> >>> dates = pd.date_range('20131201', periods=5, freq='H') >>> data = [[1, 2], [4, 5], [9, np.nan], [16, 17], [25, 26]] >>> df = pd.DataFrame(data, index=dates, ... columns=list('AB')) >>> >>> print(df.to_string()) AB 2013-12-01 00:00:00 1 2 2013-12-01 01:00:00 4 5 2013-12-01 02:00:00 9 NaN 2013-12-01 03:00:00 16 17 2013-12-01 04:00:00 25 26 >>> df.plot() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python33x86\lib\site-packages\pandas\tools\plotting.py", line 1636, in plot_frame plot_obj.generate() File "C:\Python33x86\lib\site-packages\pandas\tools\plotting.py", line 856, in generate self._make_plot() File "C:\Python33x86\lib\site-packages\pandas\tools\plotting.py", line 1240, in _make_plot self._make_ts_plot(data, **self.kwds) File "C:\Python33x86\lib\site-packages\pandas\tools\plotting.py", line 1321, in _make_ts_plot _plot(data[col], i, ax, label, style, **kwds) File "C:\Python33x86\lib\site-packages\pandas\tools\plotting.py", line 1295, in _plot style=style, **kwds) File "C:\Python33x86\lib\site-packages\pandas\tseries\plotting.py", line 77, in tsplot lines = plotf(ax, *args, **kwargs) File "C:\Python33x86\lib\site-packages\matplotlib\axes.py", line 4139, in plot for line in self._get_lines(*args, **kwargs): File "C:\Python33x86\lib\site-packages\matplotlib\axes.py", line 319, in _grab_next_args for seg in self._plot_args(remaining, kwargs): File "C:\Python33x86\lib\site-packages\matplotlib\axes.py", line 297, in _plot_args x, y = self._xy_from_xy(x, y) File "C:\Python33x86\lib\site-packages\matplotlib\axes.py", line 216, in _xy_from_xy by = self.axes.yaxis.update_units(y) File "C:\Python33x86\lib\site-packages\matplotlib\axis.py", line 1337, in update_units converter = munits.registry.get_converter(data) File "C:\Python33x86\lib\site-packages\matplotlib\units.py", line 137, in get_converter xravel = x.ravel() File "C:\Python33x86\lib\site-packages\numpy\ma\core.py", line 3969, in ravel r._mask = ndarray.ravel(self._mask).reshape(r.shape) File "C:\Python33x86\lib\site-packages\pandas\core\series.py", line 981, in reshape return ndarray.reshape(self, newshape, order) TypeError: an integer is required 

The above code works if I replace np.NaN with a number, like "2.3".

Calculating as two separate Series does not work (it fails when I add a Series containing NaN to the graph):

 C:\>\Python33x86\python.exe Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd >>> import numpy as np >>> import matplotlib.pyplot as plt >>> >>> dates = pd.date_range('20131201', periods=5, freq='H') >>> data = [[1, 2], [4, 5], [9, np.nan], [16, 17], [25, 26]] >>> df = pd.DataFrame(data, index=dates, ... columns=list('AB')) >>> >>> print(df.to_string()) AB 2013-12-01 00:00:00 1 2 2013-12-01 01:00:00 4 5 2013-12-01 02:00:00 9 NaN 2013-12-01 03:00:00 16 17 2013-12-01 04:00:00 25 26 >>> df['A'].plot(label='This is A', style='k') <matplotlib.axes.AxesSubplot object at 0x02ACFF90> >>> df['B'].plot(label='This is B', style='g') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python33x86\lib\site-packages\pandas\tools\plotting.py", line 1730, in plot_series plot_obj.generate() File "C:\Python33x86\lib\site-packages\pandas\tools\plotting.py", line 856, in generate self._make_plot() File "C:\Python33x86\lib\site-packages\pandas\tools\plotting.py", line 1240, in _make_plot self._make_ts_plot(data, **self.kwds) File "C:\Python33x86\lib\site-packages\pandas\tools\plotting.py", line 1311, in _make_ts_plot _plot(data, 0, ax, label, self.style, **kwds) File "C:\Python33x86\lib\site-packages\pandas\tools\plotting.py", line 1295, in _plot style=style, **kwds) File "C:\Python33x86\lib\site-packages\pandas\tseries\plotting.py", line 77, in tsplot lines = plotf(ax, *args, **kwargs) File "C:\Python33x86\lib\site-packages\matplotlib\axes.py", line 4139, in plot for line in self._get_lines(*args, **kwargs): File "C:\Python33x86\lib\site-packages\matplotlib\axes.py", line 319, in _grab_next_args for seg in self._plot_args(remaining, kwargs): File "C:\Python33x86\lib\site-packages\matplotlib\axes.py", line 297, in _plot_args x, y = self._xy_from_xy(x, y) File "C:\Python33x86\lib\site-packages\matplotlib\axes.py", line 216, in _xy_from_xy by = self.axes.yaxis.update_units(y) File "C:\Python33x86\lib\site-packages\matplotlib\axis.py", line 1337, in update_units converter = munits.registry.get_converter(data) File "C:\Python33x86\lib\site-packages\matplotlib\units.py", line 137, in get_converter xravel = x.ravel() File "C:\Python33x86\lib\site-packages\numpy\ma\core.py", line 3969, in ravel r._mask = ndarray.ravel(self._mask).reshape(r.shape) File "C:\Python33x86\lib\site-packages\pandas\core\series.py", line 981, in reshape return ndarray.reshape(self, newshape, order) TypeError: an integer is required 

However, if I do this directly using the Matplotlib Pyplot () graph, instead of using the Pandas' plot () function, it works:

 C:\>\Python33x86\python.exe Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd >>> import numpy as np >>> import matplotlib.pyplot as plt >>> dates = pd.date_range('20131201', periods=5, freq='H') >>> plt.plot(dates, [1, 4, 9, 16, 25], 'k', dates, [2, 5, np.NAN, 17, 26], 'g') [<matplotlib.lines.Line2D object at 0x03E98650>, <matplotlib.lines.Line2D object at 0x040929B0>] >>> plt.show() >>> 

So it seems that I have a workaround, but since I am drawing large DataFrames, I would prefer to use the Pandas' plot () method, which is more convenient. I tried to track the stack trace, but after a while it becomes complicated (I am not familiar with Pandas, the source code of Numpy and Matplotlib). Am I doing something wrong, or is this a possible error in Pandas' plot ()?

Thank you for your help!

I tried on both Windows x86 and Linux AMD64 with the same results with these versions:

  • Python 3.3.2
  • Pandas 0.12.0
  • Matplotlib 1.3.1
  • Numpy 1.7.1
+6
source share
2 answers

This seems to be matplotlib 1.3.1 with pandas 0.12 integration error :

A workaround is to switch to matplotlib 1.3.0 . (Note, however, that this version of matplotlib contains an error on systems that have fonts with non-ASCII font names, so you may need to choose your problem!). This downgrade will downgrade to numpy 1.7.1 , so you should (again) upgrade to numpy 1.8.0 . This bug should be fixed in the upcoming pandas 0.13 . However, pandas 0.13 may break some existing code (because pandas.Series is no longer a subclass of numpy.ndarray), so again some tough choices may be required, at least in the short term.

Just tested, the code works fine with matplotlib 1.3.0 :

 >>> import matplotlib >>> matplotlib.__version__ '1.3.0' >>> df.plot() <matplotlib.axes.AxesSubplot object at 0x04E8B4F0> >>> plt.show(_) 

enter image description here

+2
source

I will get around the problem with the following:

 fig, ax = plt.subplots() ax.plot(df) 
+1
source

All Articles