You can also limit the effect of outliers using scipy.optimize.least_squares . In particular, see the f_scale parameter:
The default soft field between residual and residual outliers is 1.0 .... This parameter does not affect loss = 'linear', but for other loss values this is crucial.
On the page, they compare 3 different functions: the usual least_squares and two methods involving f_scale :
res_lsq = least_squares(fun, x0, args=(t_train, y_train)) res_soft_l1 = least_squares(fun, x0, loss='soft_l1', f_scale=0.1, args=(t_train, y_train)) res_log = least_squares(fun, x0, loss='cauchy', f_scale=0.1, args=(t_train, y_train))

As you can see, the normal least squares are much more dependent on data outliers, and it’s worth playing with various loss functions in combination with different f_scales . Possible loss functions (taken from the documentation):
'linear' : Gives a standard least-squares problem. 'soft_l1': The smooth approximation of l1 (absolute value) loss. Usually a good choice for robust least squares. 'huber' : Works similarly to 'soft_l1'. 'cauchy' : Severely weakens outliers influence, but may cause difficulties in optimization process. 'arctan' : Limits a maximum loss on a single residual, has properties similar to 'cauchy'.
The cookbook contains a neat textbook on reliable nonlinear regression.
pingul May 04 '17 at 14:57 2017-05-04 14:57
source share