A square is often used to strongly discriminate large differences. If you have a rather large error (difference), and you will square it, the result will be even greater. Therefore, the optimization method based on the quadratic value will "try" to get rid of the biggest differences (outliers) in the first place.
It is also known that square methods are better for a Gaussian distribution, and absolute methods are better for a Laplacian noise distribution (perturbation).
Gacek source share