Itโs a little difficult to answer without knowing which products youโre trying to evaluate from the shelf. You are looking for UI responsiveness, bandwidth (e.g. email, transactions / sec), startup time, etc. - they all have different criteria for what measures should be monitored, and various tools for testing or evaluation. But to answer some of your general questions:
Trust is important. Try to make sure that everything you measure does not work much to deviate. Use the technique of performing multiple runs of the same scenario, get rid of outliers (i.e. your lowest and highest) and estimate the average values โโof avg / max / min / median. If you are doing some kind of bandwidth test, consider making it long, so you have a good set of samples. For example, if you look at something like Microsoft Exchange and thus use your counters, try to make sure that you take frequent samples (once per second or every few seconds) and run a test run for 20 minutes or so. Again, discard the first few minutes and the last few minutes to eliminate start / stop noise.
Heisenburg is tricky. On most modern systems, depending on which applications / measures you are measuring, you can minimize this impact by being smart about what / how you are measuring. Sometimes (for example, in the Exchange example) you will see an effect of about 0. Try to use the least invasive tools possible. For example, if you measure startup time, consider using xperfinfo and use the events built into the kernel. If you use perfmon, do not flood the system with extraneous counters that you do not need. If you are doing some existentially lengthy test, store it at the sampling interval.
Also try to eliminate any sources of environmental variability or possible sources of noise. If you are doing something network, think about network isolation. Try disabling any services or applications that you do not need. Limit any disk I / O, memory intensive operations, etc. If an IO drive can introduce noise into what is connected to the processor, consider using an SSD.
When designing tests, consider repeatability. If you perform any type of micro-object testing (for example, perf unit test), then your infrastructure support works with the same operation n times in exactly the same way. If you control the user interface, try not to physically control the mouse and instead use the basic level of accessibility (MSAA, UIAutomation, etc.) to directly control the controls.
Again, this is just general advice. If you have more specific features, I can try to track more relevant recommendations.
Enjoy it!
nithins
source share