The difference between a standard calculator and a normalizer in sklearn.preprocessing

What is the difference between a standard calculator and a normalizer in the sklearn.preprocessing module? Don't both do the same? i) remove the mean and scale using deviation?

+18
scikit-learn statistics machine-learning
source share
6 answers

From Normalizer docs:

Each sample (i.e., each row of the data matrix) with at least one nonzero component is rescaled independently of other samples, so that its norm (l1 or l2) is equal to one.

And StandardScaler

Standardize functions by removing mean and scaling for unit variance

In other words, Normalizer acts differently on StandardScaler columns. The normalizer does not remove the mean and scale deviations, but scales the entire line to a single norm.

+27
source share

This visualization and Ben's article are very helpful in illustrating the idea.

StandardScaler assumes that your data is usually distributed within each function. “By removing the average value and scaling to a single dispersion”, you can see in the figure now that they have the same “scale” regardless of its original.

+6
source share

StandardScaler standardizes the functions by removing the average value and scaling to the unit of variance, Normalizer scales each sample.

+1
source share

StandardScaler () standardizes functions (such as data about a person, for example, height, weight), removing the average value and scaling to a single dispersion.

(Difference in units: The difference in units means that the standard deviation of the sample, as well as the variance, will tend to 1, as the sample size tends to infinity.)

Normalizer () zooms out each sample. For example, recalculating the share price of each company, regardless of the other.

Some stocks are more expensive than others. To account for this, we normalize it. The normalizer separately converts the stock price of each company into a relative scale.

+1
source share

In addition to @vincentlcy’s excellent suggestion to view this article, there is now an example in the Scikit-Learn documentation here . An important difference is that Normalizer() is applied to each sample (that is, to the row), and not to the column. This can only work for certain data sets that match the assumption of similar data types in each column.

0
source share

standard scaling i means that StandardScaler is used to normalize data in order to behave like normal distributed data. It is widely used in machine learning, because suppose that if u takes height into account as a function, it behaves randomly, as u transforms it from cm to feet in comparison with so that data normalization will come to our aid. In this case, normalization is performed differently.

-4
source share

All Articles