How to measure reliability?

I am working on a thesis regarding product quality. The product in this case is a website. I have identified several quality attributes and hash methods.

One attribute of quality is Reliability. I want to somehow succumb to this, but I cannot find any useful information on how to do this objectively.

Is there any static or dynamic metric that could improve reliability? That is, as unit test coverage, is there a way to protect such confidence? If so, is there any (free) tool that can do this?

Does anyone have experience with such tools?

And last but not least, there may be other ways to determine reliability if you have any ideas that I’m all ears.

Thank you very much in advance.

+6
java robustness software-quality
source share
5 answers

Well, the short answer is no. Reliability can mean a lot of things, but the best definition I can come up with is "do it right in every situation." If you send a bad HTTP header to a trusted web server, it should not crash. It should return exactly the correct type of error, and it should log the event somewhere, possibly in a custom way. If a reliable web server runs for a very long time, its memory should remain unchanged.

Much of what makes the system reliable is the handling of edge cases. Good unit tests are part of this, but it is likely that there will be no unit tests for any of the problems that the system has (if these problems were known, the developers would probably fix them and only then add the test).

Unfortunately, it is almost impossible to measure the reliability of an arbitrary program, because for this you need to know what this program should do. If you have a specification, you can write a huge number of tests, and then run them against any client as a test. For example, look at the Acid2 browser check. It carefully measures how well any given web browser complies with the standard in a simple and repeatable way. This is about as close as you can get, and people have pointed out many flaws with this approach (for example, is it a program that works more often but does one additional thing according to the specification more reliable?)

However, there are various checks that you can use as an approximate, quantitative assessment of the state of the system. Unit test coverage is pretty standard, like its siblings, branch coverage, function coverage, operator coverage, etc. Other good choices are lint programs such as FindBugs. This may indicate potential problems. Open source projects are often evaluated based on how often and recently commits are made or releases are issued. If the project has an error system, you can measure the number of errors corrected and the percentage. If a particular instance of the program you are measuring, especially with high activity, MTBF (Mean Time Between Failures) is a good measure of reliability (see Philip Answer )

These measurements, however, do not tell you how reliable the program is. These are just ways to guess. If it were easy to determine if the program was reliable, we would probably just make a compiler for it.

Good luck with your dissertation! I hope you come up with new interesting dimensions!

+13
source share

You can see the average time between failures as a measure of reliability. The problem is that this is a theoretical amount that is difficult to measure, especially before you deploy your product in a real situation with real loads. Part of this is that testing often does not cover scalability issues in the real world.

+4
source share

In our book Fuzzing (Takanen, DeMott, Miller) we have several chapters on metrics and coverage in negative testing (reliability, reliability, grammar testing, fuzzing, many names for the same thing). In addition, I tried to summarize the most important aspects in our technical documentation:

http://www.codenomicon.com/products/coverage.shtml

Excerpt from there:


Coverage can be considered as the sum of two characteristics, accuracy and precision. Accuracy relates to protocol coverage. Test accuracy is determined by how well tests cover different protocol messages, message structures, tags, and data definitions. Accuracy, on the other hand, measures how accurately tests can find errors in different areas of the protocol. Therefore, accuracy can be considered as a form of coverage of anomalies. However, accuracy and accuracy are rather abstract terms, so we will need to consider more specific indicators to measure coverage.

The first aspect of coverage relates to the surface of attack. Analysis of testing requirements always begins with identifying the interfaces that require testing. The number of different interfaces and protocols that they implement in different layers sets the requirements for fusers. Each protocol, file format, or API may require its own type of fuzzer, depending on security requirements.

The second coverage metric is related to the specification supported by fuzzer. This type of metric is easy to use with model-based fusers, since the tool is based on the specifications used to create the fuzzer, and is therefore easy to list. The model fuser should cover the entire specification. While mutation-based museums do not necessarily fully cover the specification, since the implementation or inclusion of one messaging model from the specification does not guarantee that the entire specification will be covered. Usually, when a mutation-based fuser claims that the specification supports, this means that it is compatible with targets that implement the specification.

In particular regarding logging, the third most important metric is the level of decay of the chosen Fuzzing approach. A completely random fuzzer, as a rule, checks only the first messages in complex status protocols. The more understandable the approach you are using, the deeper the fuzzer can go into complex protocol exchanges. Complexity is a complex requirement for defining Fuzzing tools, since it is more likely a metric for determining the quality of the protocol model used and can thus only be verified by running tests.


Hope this was helpful. We also have studies in other indicators, such as viewing code coverage and other more or less useless data .;) Metrics are a great topic for a dissertation. Email me at ari.takanen@codenomicon.com if you are interested in accessing our extensive research on this topic.

+2
source share

Reliability is very subjective, but you can take a look at FingBugs , Cobertura and Hudson , which, when combined properly, can give you a sense of security over time that the software is reliable.

+1
source share

You can see the average time between failures as a measure of sustainability.

The problem with MTBF is that it is usually measured in positive traffic, whereas in unexpected situations, crashes often occur. It does not provide any indication of reliability or reliability. Regardless of whether the website is always included in the laboratory environment, it will still be hacked in a second on the Internet if it has a weakness.

0
source share

All Articles