Creating a heatmap from DataFrame pandas

I have a dataframe generated from a Python Pandas package. How can I create a heatmap using the DataFrame from the Pandas package.

import numpy as np from pandas import * Index= ['aaa','bbb','ccc','ddd','eee'] Cols = ['A', 'B', 'C','D'] df = DataFrame(abs(np.random.randn(5, 4)), index= Index, columns=Cols) >>> df ABCD aaa 2.431645 1.248688 0.267648 0.613826 bbb 0.809296 1.671020 1.564420 0.347662 ccc 1.501939 1.126518 0.702019 1.596048 ddd 0.137160 0.147368 1.504663 0.202822 eee 0.134540 3.708104 0.309097 1.641090 >>> 
+88
python pandas dataframe heatmap
Sep 05 '12 at 17:18
source share
7 answers

You want matplotlib.pcolor :

 import numpy as np from pandas import DataFrame import matplotlib.pyplot as plt Index= ['aaa', 'bbb', 'ccc', 'ddd', 'eee'] Cols = ['A', 'B', 'C', 'D'] df = DataFrame(abs(np.random.randn(5, 4)), index=Index, columns=Cols) plt.pcolor(df) plt.yticks(np.arange(0.5, len(df.index), 1), df.index) plt.xticks(np.arange(0.5, len(df.columns), 1), df.columns) plt.show() 
+67
Sep 05
source share

For people looking at this today, I would recommend Seaborn heatmap() as described here .

The example above will be done as follows:

 import numpy as np from pandas import DataFrame import seaborn as sns %matplotlib inline Index= ['aaa', 'bbb', 'ccc', 'ddd', 'eee'] Cols = ['A', 'B', 'C', 'D'] df = DataFrame(abs(np.random.randn(5, 4)), index=Index, columns=Cols) sns.heatmap(df, annot=True) 

jbcTG.png

Where %matplotlib is the IPython magic function for strangers.

+134
Apr 09 '15 at 2:00
source share

If you don't need a graph, say, and you're just interested in adding color to represent values ​​in a table format, you can use the pandas data frame's style.background_gradient() method. This method colors the HTML table that is displayed when viewing pandas data frames, for example, in JupyterLab notepad, and the result is similar to using "conditional formatting" in spreadsheet software:

 import numpy as np import pandas as pd index= ['aaa', 'bbb', 'ccc', 'ddd', 'eee'] cols = ['A', 'B', 'C', 'D'] df = pd.DataFrame(abs(np.random.randn(5, 4)), index=index, columns=cols) df.style.background_gradient(cmap='Blues') 

enter image description here

Detailed use is given in the more detailed answer I provided earlier on the same topic and in the pandas documentation styles section .

+36
May 30 '18 at 12:43
source share

Useful sns.heatmap api here . Check the parameters, there are a lot of them. Example:

 import seaborn as sns %matplotlib inline idx= ['aaa','bbb','ccc','ddd','eee'] cols = list('ABCD') df = DataFrame(abs(np.random.randn(5,4)), index=idx, columns=cols) # _r reverses the normal order of the color map 'RdYlGn' sns.heatmap(df, cmap='RdYlGn_r', linewidths=0.5, annot=True) 

enter image description here

+13
May 17 '17 at 19:46
source share

If you need an interactive heatmap from DataFrame Pandas and you are using a Jupyter laptop, you can try the Clustergrammer-Widget interactive widget, see NBViewer Interactive Notebook here , documentation here

enter image description here

And for large data sets, you can try the Clustergrammer2 WebGL widget, which is under development (an example of a notebook here )

+1
Mar 27 '19 at 15:44
source share

@joelostblom This is not an answer, this is a comment, but the problem is that I do not have enough reputation to comment.

I'm a little confused because the output value of the matrix and the original array are completely different. I would like to print real values ​​on the heat map, and not some others. Can someone explain to me why this is happening. For example:

  • Initial indexed data: aaa / A = 2.431645

  • printed values ​​on the heatmap: aaa / A = 1.06192

0
Mar 27 '19 at 17:02
source share

Please note that seaborn authors seaborn want seaborn.heatmap work with categorical data frames. This is not at all.

If your index and columns have numeric values ​​and / or date and time values, this code will be useful to you.

The Matplotlib pcolormesh thermal display pcolormesh requires bins instead of indexes, so there is some fancy code to build bins from your data frame indexes (even if your index is not evenly spaced!).

The rest is just np.meshgrid and plt.pcolormesh .

 import pandas as pd import numpy as np import matplotlib.pyplot as plt def conv_index_to_bins(index): """Calculate bins to contain the index values. The start and end bin boundaries are linearly extrapolated from the two first and last values. The middle bin boundaries are midpoints. Example 1: [0, 1] -> [-0.5, 0.5, 1.5] Example 2: [0, 1, 4] -> [-0.5, 0.5, 2.5, 5.5] Example 3: [4, 1, 0] -> [5.5, 2.5, 0.5, -0.5]""" assert index.is_monotonic_increasing or index.is_monotonic_decreasing # the beginning and end values are guessed from first and last two start = index[0] - (index[1]-index[0])/2 end = index[-1] + (index[-1]-index[-2])/2 # the middle values are the midpoints middle = pd.DataFrame({'m1': index[:-1], 'p1': index[1:]}) middle = middle['m1'] + (middle['p1']-middle['m1'])/2 if isinstance(index, pd.DatetimeIndex): idx = pd.DatetimeIndex(middle).union([start,end]) elif isinstance(index, (pd.Float64Index,pd.RangeIndex,pd.Int64Index)): idx = pd.Float64Index(middle).union([start,end]) else: print('Warning: guessing what to do with index type %s' % type(index)) idx = pd.Float64Index(middle).union([start,end]) return idx.sort_values(ascending=index.is_monotonic_increasing) def calc_df_mesh(df): """Calculate the two-dimensional bins to hold the index and column values.""" return np.meshgrid(conv_index_to_bins(df.index), conv_index_to_bins(df.columns)) def heatmap(df): """Plot a heatmap of the dataframe values using the index and columns""" X,Y = calc_df_mesh(df) c = plt.pcolormesh(X, Y, df.values.T) plt.colorbar(c) 

Call it using heatmap(df) and look using plt.show() .

enter image description here

0
Jul 01 '19 at 18:58
source share



All Articles