You can use the AIDA64 software to check for delays with instructions (but you cannot check which of the instructions to check has a hard list of instructions). People publish the results at http://instlatx64.atw.hu/
In lock instructions, AIDA64 checks the lock add and xchg [mem] (which always lock even without explicit prefix locking).
Here are some details. I will also give you, by comparison, delays in the following instructions:
xchg reg1, reg2 , which is not blocked;add for registers and memory.
As you can see, locking instructions are only 5 times slower on Haswell-DT and about 2 times slower on Kaby Lake-S than non-blocking storage.
Intel Core i5-4430, 3000 MHz (30 x 100) Haswell-DT
LOCK ADD [m8], r8 L: 5.96ns= 17.8c T: 7.21ns= 21.58c LOCK ADD [m16], r16 L: 5.96ns= 17.8c T: 7.21ns= 21.58c LOCK ADD [m32], r32 L: 5.96ns= 17.8c T: 7.21ns= 21.58c LOCK ADD [m32 + 8], r32 L: 5.96ns= 17.8c T: 7.21ns= 21.58c LOCK ADD [m64], r64 L: 5.96ns= 17.8c T: 7.21ns= 21.58c LOCK ADD [m64 + 16], r64 L: 5.96ns= 17.8c T: 7.21ns= 21.58c XCHG r8, [m8] L: 5.96ns= 17.8c T: 7.21ns= 21.58c XCHG r16, [m16] L: 5.96ns= 17.8c T: 7.21ns= 21.58c XCHG r32, [m32] L: 5.96ns= 17.8c T: 7.21ns= 21.58c XCHG r64, [m64] L: 5.96ns= 17.8c T: 7.21ns= 21.58c ADD r32, 0x04000 L: 0.22ns= 0.9c T: 0.09ns= 0.36c ADD r32, 0x08000 L: 0.22ns= 0.9c T: 0.09ns= 0.36c ADD r32, 0x10000 L: 0.22ns= 0.9c T: 0.09ns= 0.36c ADD r32, 0x20000 L: 0.22ns= 0.9c T: 0.08ns= 0.34c ADD r8, r8 L: 0.22ns= 0.9c T: 0.05ns= 0.23c ADD r16, r16 L: 0.22ns= 0.9c T: 0.07ns= 0.29c ADD r32, r32 L: 0.22ns= 0.9c T: 0.05ns= 0.23c ADD r64, r64 L: 0.22ns= 0.9c T: 0.07ns= 0.29c ADD r8, [m8] L: 1.33ns= 5.6c T: 0.11ns= 0.47c ADD r16, [m16] L: 1.33ns= 5.6c T: 0.11ns= 0.47c ADD r32, [m32] L: 1.33ns= 5.6c T: 0.11ns= 0.47c ADD r64, [m64] L: 1.33ns= 5.6c T: 0.11ns= 0.47c ADD [m8], r8 L: 1.19ns= 5.0c T: 0.32ns= 1.33c ADD [m16], r16 L: 1.19ns= 5.0c T: 0.21ns= 0.88c ADD [m32], r32 L: 1.19ns= 5.0c T: 0.22ns= 0.92c ADD [m32 + 8], r32 L: 1.19ns= 5.0c T: 0.22ns= 0.92c ADD [m64], r64 L: 1.19ns= 5.0c T: 0.20ns= 0.85c ADD [m64 + 16], r64 L: 1.19ns= 5.0c T: 0.18ns= 0.73c
Intel Core i7-7700K, 4700 MHz (47 x 100) Kaby Lake-S
LOCK ADD [m8], r8 L: 4.01ns= 16.8c T: 5.12ns= 21.50c LOCK ADD [m16], r16 L: 4.01ns= 16.8c T: 5.12ns= 21.50c LOCK ADD [m32], r32 L: 4.01ns= 16.8c T: 5.12ns= 21.50c LOCK ADD [m32 + 8], r32 L: 4.01ns= 16.8c T: 5.12ns= 21.50c LOCK ADD [m64], r64 L: 4.01ns= 16.8c T: 5.12ns= 21.50c LOCK ADD [m64 + 16], r64 L: 4.01ns= 16.8c T: 5.12ns= 21.50c XCHG r8, [m8] L: 4.01ns= 16.8c T: 5.12ns= 21.50c XCHG r16, [m16] L: 4.01ns= 16.8c T: 5.12ns= 21.50c XCHG r32, [m32] L: 4.01ns= 16.8c T: 5.20ns= 21.83c XCHG r64, [m64] L: 4.01ns= 16.8c T: 5.12ns= 21.50c ADD r32, 0x04000 L: 0.33ns= 1.0c T: 0.12ns= 0.36c ADD r32, 0x08000 L: 0.31ns= 0.9c T: 0.12ns= 0.37c ADD r32, 0x10000 L: 0.31ns= 0.9c T: 0.12ns= 0.36c ADD r32, 0x20000 L: 0.31ns= 0.9c T: 0.12ns= 0.36c ADD r8, r8 L: 0.31ns= 0.9c T: 0.11ns= 0.34c ADD r16, r16 L: 0.31ns= 0.9c T: 0.11ns= 0.32c ADD r32, r32 L: 0.31ns= 0.9c T: 0.11ns= 0.34c ADD r64, r64 L: 0.31ns= 0.9c T: 0.10ns= 0.31c ADD r8, [m8] L: 1.87ns= 5.6c T: 0.16ns= 0.47c ADD r16, [m16] L: 1.87ns= 5.6c T: 0.16ns= 0.47c ADD r32, [m32] L: 1.87ns= 5.6c T: 0.16ns= 0.47c ADD r64, [m64] L: 1.87ns= 5.6c T: 0.16ns= 0.47c ADD [m8], r8 L: 1.89ns= 5.7c T: 0.33ns= 1.00c ADD [m16], r16 L: 1.87ns= 5.6c T: 0.26ns= 0.78c ADD [m32], r32 L: 1.87ns= 5.6c T: 0.28ns= 0.84c ADD [m32 + 8], r32 L: 1.89ns= 5.7c T: 0.26ns= 0.78c ADD [m64], r64 L: 1.89ns= 5.7c T: 0.33ns= 1.00c ADD [m64 + 16], r64 L: 1.89ns= 5.7c T: 0.24ns= 0.73c