Matplotlib logarithmic scale with zero value

I have a very large and rare dataset for twitter spam accounts, and I need to scale the x axis to be able to visualize the distribution (histogram, kde, etc.) and cdf of various variables (tweets_count, number of followers / next, etc.) .d.).

> describe(spammers_class1$tweets_count) var n mean sd median trimmed mad min max range skew kurtosis se 1 1 1076817 443.47 3729.05 35 57.29 43 0 669873 669873 53.23 5974.73 3.59 

In this dataset, the value 0 is of great importance (in fact, 0 must have the highest density). However, on a logarithmic scale, these values โ€‹โ€‹are ignored. I was thinking of changing the value to 0.1, for example, but it doesn't make sense that there are spam accounts that have 10 ^ -1 followers.

So what will be the workaround in python and matplotlib?

+4
source share
2 answers

Add 1 to each x value, then take the log:

 import matplotlib.pyplot as plt import numpy as np import matplotlib.ticker as ticker fig, ax = plt.subplots() x = [0, 10, 100, 1000] y = [100, 20, 10, 50] x = np.asarray(x) + 1 y = np.asarray(y) ax.plot(x, y) ax.set_xscale('log') ax.set_xlim(x.min(), x.max()) ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x-1))) ax.xaxis.set_major_locator(ticker.FixedLocator(x)) plt.show() 

enter image description here


Use

 ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x-1))) ax.xaxis.set_major_locator(ticker.FixedLocator(x)) 

to change label labels to values โ€‹โ€‹other than the x log.

(My initial suggestion was to use plt.xticks(x, x-1) , but that would affect all axes. To highlight the changes for individual axes, I changed all command calls to ax , not plt calls. )


matplotlib deletes points containing NaN , inf or -inf . Since log(0) -inf , the point corresponding to x=0 will be removed from the logarithm graph.

If you increase all x-values โ€‹โ€‹by 1 since log(1) = 0 , the point corresponding to x=0 will not be built on x=log(1)=0 in the log chart.

The remaining x values โ€‹โ€‹will also be shifted by one, but this does not matter for the eye, since log(x+1) very close to log(x) for large x values.

+1
source
 ax1.set_xlim(0, 1e3) 

Here is an example from the matplotlib documentation.

And there he sets the limit values โ€‹โ€‹of the axes as follows:

 ax1.set_xlim(1e1, 1e3) ax1.set_ylim(1e2, 1e3) 
0
source

All Articles