What is the difference between a partial fit and a warm start?

Context :

I use the Passive Aggressor from the Scikit library and donโ€™t understand whether to use a warm start or a partial fit.

Efforts so far :

  1. Submitted this thread of discussion:

https://github.com/scikit-learn/scikit-learn/issues/1585

  1. Scikit code passed for _fit and _partial_fit.

My observations :

  1. _fit in turn calls _partial_fit .

  2. When the warm_start parameter warm_start set, _fit calls _partial_fit using self.coef_

  3. When _partial_fit is called without the coef_init parameter and self.coef_ set, it continues to use self.coef_

Question :

I feel that both end up providing the same functionality. Then what is their main difference? In what contexts is any of these used?

Am I missing something? Any help is appreciated!

+12
python scikit-learn machine-learning
source share
4 answers

I donโ€™t know about the passive aggressor, but at least when using SGDRegressor , partial_fit is suitable only for 1 era, while fit suitable for several eras (until the loss converges or max_iter reached). Therefore, when fitting new data to your model, partial_fit will adjust the model only one step to new data, but with fit and warm_start it will act as if you combined your old data and new data together and fitted the model once, while convergence.

Example:

 from sklearn.linear_model import SGDRegressor import numpy as np np.random.seed(0) X = np.linspace(-1, 1, num=50).reshape(-1, 1) Y = (X * 1.5 + 2).reshape(50,) modelFit = SGDRegressor(learning_rate="adaptive", eta0=0.01, random_state=0, verbose=1, shuffle=True, max_iter=2000, tol=1e-3, warm_start=True) modelPartialFit = SGDRegressor(learning_rate="adaptive", eta0=0.01, random_state=0, verbose=1, shuffle=True, max_iter=2000, tol=1e-3, warm_start=False) # first fit some data modelFit.fit(X, Y) modelPartialFit.fit(X, Y) # for both: Convergence after 50 epochs, Norm: 1.46, NNZs: 1, Bias: 2.000027, T: 2500, Avg. loss: 0.000237 print(modelFit.coef_, modelPartialFit.coef_) # for both: [1.46303288] # now fit new data (zeros) newX = X newY = 0 * Y # fits only for 1 epoch, Norm: 1.23, NNZs: 1, Bias: 1.208630, T: 50, Avg. loss: 1.595492: modelPartialFit.partial_fit(newX, newY) # Convergence after 49 epochs, Norm: 0.04, NNZs: 1, Bias: 0.000077, T: 2450, Avg. loss: 0.000313: modelFit.fit(newX, newY) print(modelFit.coef_, modelPartialFit.coef_) # [0.04245779] vs. [1.22919864] newX = np.reshape([2], (-1, 1)) print(modelFit.predict(newX), modelPartialFit.predict(newX)) # [0.08499296] vs. [3.66702685] 
+1
source share

First, let's look at the difference between .fit() and .partial_fit() .

.fit() allows you to train from scratch. Therefore, you can think of it as an option that can only be used once for a model. If you again .fit() with a new dataset, the model will be built on new data and will not affect the previous dataset.

.partial_fit() will allow you to update the model with additional data. Therefore, this option can be used more than once for a model. This can be useful when the entire data set cannot be loaded into memory, see here .

If both .fit() or .partial_fit() will be used once, then it does not matter.

warm_start can only be used in .fit() , this will allow you to start training with the coefficient of the previous fit() . Now this may sound similar to the purpose of partial_fit() , but the recommended way is partial_fit() . To improve learning, partial_fit() can be used several times with the same incremental data.

0
source share

If warm_start = False , each subsequent call to .fit() (after the initial call to .fit() or partial_fit() ) resets the trained model parameters for initialization. If warm_start = True , each subsequent call to .fit() (after the initial call to .fit() or partial_fit() ) will save the values โ€‹โ€‹of the trained model parameters from the previous run and use them initially. Regardless of the value of warm_start each call to partial_fit() will retain the model parameters of the previous run, and they will be used initially.

An example of using MLPRegressor :

 import sklearn.neural_network import numpy as np np.random.seed(0) x = np.linspace(-1, 1, num=50).reshape(-1, 1) y = (x * 1.5 + 2).reshape(50,) cold_model = sklearn.neural_network.MLPRegressor(hidden_layer_sizes=(), warm_start=False, max_iter=1) warm_model = sklearn.neural_network.MLPRegressor(hidden_layer_sizes=(), warm_start=True, max_iter=1) cold_model.fit(x,y) print cold_model.coefs_, cold_model.intercepts_ #[array([[0.17009494]])] [array([0.74643783])] cold_model.fit(x,y) print cold_model.coefs_, cold_model.intercepts_ #[array([[-0.60819342]])] [array([-1.21256186])] #after second run of .fit(), values are completely different #because they were re-initialised before doing the second run for the cold model warm_model.fit(x,y) print warm_model.coefs_, warm_model.intercepts_ #[array([[-1.39815616]])] [array([1.651504])] warm_model.fit(x,y) print warm_model.coefs_, warm_model.intercepts_ #[array([[-1.39715616]])] [array([1.652504])] #this time with the warm model, params change relatively little, as params were #not re-initialised during second call to .fit() cold_model.partial_fit(x,y) print cold_model.coefs_, cold_model.intercepts_ #[array([[-0.60719343]])] [array([-1.21156187])] cold_model.partial_fit(x,y) print cold_model.coefs_, cold_model.intercepts_ #[array([[-0.60619347]])] [array([-1.21056189])] #with partial_fit(), params barely change even for cold model, #as no re-initialisation occurs warm_model.partial_fit(x,y) print warm_model.coefs_, warm_model.intercepts_ #[array([[-1.39615617]])] [array([1.65350392])] warm_model.partial_fit(x,y) print warm_model.coefs_, warm_model.intercepts_ #[array([[-1.39515619]])] [array([1.65450372])] #and of course the same goes for the warm model 
0
source share

About the difference. A warm start is just an attribute of a class. A partial approach is a method of this class. These are basically different things.

About the same features. Yes, a partial fit will use self.coef_, because it is still necessary to get some values โ€‹โ€‹for the update during the training period. And for an empty coef_init, we simply put the null values โ€‹โ€‹in self.coef_ and move on to the next step in the training.

Description.

For the first launch: Whatever it is (with or without a warm start). We will train at zero coefficients, but as a result we can save our average coefficients.

N + 1 start:

With a warm start. We will check our previous coefficients through the _allocate_parameter_mem method and take it for training. As a result, we maintain our average coefficients.

Without a warm start. We set the zero coefficients (as the first start) and move on to the training step. As a result, we will still write the average coefficients in memory.

-one
source share

All Articles