In our book Fuzzing (Takanen, DeMott, Miller) we have several chapters on metrics and coverage in negative testing (reliability, reliability, grammar testing, fuzzing, many names for the same thing). In addition, I tried to summarize the most important aspects in our technical documentation:
http://www.codenomicon.com/products/coverage.shtml
Excerpt from there:
Coverage can be considered as the sum of two characteristics, accuracy and precision. Accuracy relates to protocol coverage. Test accuracy is determined by how well tests cover different protocol messages, message structures, tags, and data definitions. Accuracy, on the other hand, measures how accurately tests can find errors in different areas of the protocol. Therefore, accuracy can be considered as a form of coverage of anomalies. However, accuracy and accuracy are rather abstract terms, so we will need to consider more specific indicators to measure coverage.
The first aspect of coverage relates to the surface of attack. Analysis of testing requirements always begins with identifying the interfaces that require testing. The number of different interfaces and protocols that they implement in different layers sets the requirements for fusers. Each protocol, file format, or API may require its own type of fuzzer, depending on security requirements.
The second coverage metric is related to the specification supported by fuzzer. This type of metric is easy to use with model-based fusers, since the tool is based on the specifications used to create the fuzzer, and is therefore easy to list. The model fuser should cover the entire specification. While mutation-based museums do not necessarily fully cover the specification, since the implementation or inclusion of one messaging model from the specification does not guarantee that the entire specification will be covered. Usually, when a mutation-based fuser claims that the specification supports, this means that it is compatible with targets that implement the specification.
In particular regarding logging, the third most important metric is the level of decay of the chosen Fuzzing approach. A completely random fuzzer, as a rule, checks only the first messages in complex status protocols. The more understandable the approach you are using, the deeper the fuzzer can go into complex protocol exchanges. Complexity is a complex requirement for defining Fuzzing tools, since it is more likely a metric for determining the quality of the protocol model used and can thus only be verified by running tests.
Hope this was helpful. We also have studies in other indicators, such as viewing code coverage and other more or less useless data .;) Metrics are a great topic for a dissertation. Email me at ari.takanen@codenomicon.com if you are interested in accessing our extensive research on this topic.