The standard backpropagation algorithm (gradient descent) becomes serious when the number of layers becomes large. The probability of local minima in the error function increases with each layer. Not only local minima in the mathematical sense cause problems, sometimes there are only flat areas in the error function (changing one or more weights does not lead to a significant change in error), where the gradient descent does not work.
On the other hand, networks with many layers can solve more complex problems, since each layer of cells can also provide an abstraction layer.
Deep learning addresses this particular problem. The basic idea is to perform an uncontrolled learning process on each individual layer in addition to using gradient descent for the entire network. The goal of unsupervised learning is to make each individual layer an extract of characteristic features outside of its input, which can be used by subsequent layers.
Although the term "Deep Learning" is currently used too broadly, it is more than just marketing advertising.
Edit: A few years ago, many people, including myself, believed that uncontrolled pre-training was a major factor in deep learning. Since then, other methods have become popular, which in many cases gives even better results. As mentioned in a comment by @Safak Okzan (below his own answer), they include:
source share