The filling procedure should not create collisions. If you have message m, it is added to pm, which has a length of 512. Now imagine pm as message m 'in itself, that is, bits of padding are already added as if they were part of the message. If the addition just keeps m 'unchanged, as you assume, then m and m' will give the same hash value, even if they are different messages. This will be a clash, also known as "not very good."
Generally speaking, the filling procedure should be such that it can be unambiguously deleted: you should be able to view the filled message and without hesitation decide which bits from the message itself and which were added as an addition. Nothing during the hash function actually removes the pad, but should be conceptually feasible. This is mathematically impossible if messages of a multiple of 512 are βpaddedβ without adding a single bit at all.
The above is common to all hash functions. MD5 and several functions of the same common family (including SHA-1, SHA-256 ...), using the Merkle-DamgΓ₯rd construct , it is also necessary that the input length be encoded in the appendix (this is necessary to provide some evidence of security) . In MD5, the length is encoded as a 64-bit number. With a β1β bit for any message (and no more than 511) there must be at least 65 padding bits.
source share