Assuming that you know how your data is distributed (i.e. , you know the pdf of your data), Scipy supports discrete data when calculating cdf.
import numpy as np import scipy import matplotlib.pyplot as plt import seaborn as sns x = np.random.randn(10000)

We can even print the first few cdf values ββto show that they are discrete
print(norm_cdf[:10]) >>> array([0.39216484, 0.09554546, 0.71268696, 0.5007396 , 0.76484329, 0.37920836, 0.86010018, 0.9191937 , 0.46374527, 0.4576634 ])
The same method for calculating cdf also works for several dimensions: we use the 2d data below to illustrate
mu = np.zeros(2) # mean vector cov = np.array([[1,0.6],[0.6,1]]) # covariance matrix # generate 2d normally distributed samples using 0 mean and the covariance matrix above x = np.random.multivariate_normal(mean=mu, cov=cov, size=1000) # 1000 samples norm_cdf = scipy.stats.norm.cdf(x) print(norm_cdf.shape) >>> (1000, 2)
In the examples above, I knew that my data was usually distributed, so I used scipy.stats.norm() - there are several distributions that scipy supports. But then again, you need to know how your data is distributed in advance in order to use such functions. If you do not know how your data is distributed, and you just use any distribution to calculate cdf, most likely you will get incorrect results.
Pyrsquared
source share