The answers to all your questions are contained in the document that you linked. You must read it carefully.
Are these numbers independent of vectors?
Not. See, for example, table 21-15 in the document that you linked. Note the delay of the short FADDS vector.
Does this mean that I can start a new FMULS operation every cycle if it does not depend on an earlier result that is not yet available?
Yes, this is a definition of bandwidth.
what happens if I have two FMULS functions after each other, where one argument depends on the previous calculation
Execution will stop until the result of the first FMULS . See details in 21.6 "Operation of the display".
what if we are in vectormode with 4 elements, and in the second FMULS instruction all input registers except one are available. what will happen
He will stop. The same link.
sqrt and division: will the sqrt or division operation prevent any subsequent operation from starting for 19 cycles?
Not. See Section 21.10, Parallel Execution. An example is shown in Table 21-15, in which independent FADDS is executed immediately after FDIVS .
Note that writing short-vector VFP code, which runs significantly faster than scalar code for many types of computations, can be a bit complicated (although not impossible). Even if you learn how to do this, it will be of dubious value, since the NEON block is apparently a new vector model for computing on ARM. Ultimately, you may be better served by ignoring the short-vector operation at the moment and focusing on training NEON for the future.