Unit testing - algorithm or sample based?

Question

Unit testing - algorithm or sample based?

Let's say I'm trying to test a simple Set class

public IntSet : IEnumerable<int> { Add(int i) {...} //IEnumerable implementation... }

And suppose I'm trying to verify that there are no duplicate values in the set. My first option is to insert some sample data into the set and check for duplicates using my knowledge of the data that I used, for example:

  //OPTION 1 void InsertDuplicateValues_OnlyOneInstancePerValueShouldBeInTheSet() { var set = new IntSet(); //3 will be added 3 times var values = new List<int> {1, 2, 3, 3, 3, 4, 5}; foreach (int i in values) set.Add(i); //I know 3 is the only candidate to appear multiple times int counter = 0; foreach (int i in set) if (i == 3) counter++; Assert.AreEqual(1, counter); }

My second option is to check my condition as a whole:

  //OPTION 2 void InsertDuplicateValues_OnlyOneInstancePerValueShouldBeInTheSet() { var set = new IntSet(); //The following could even be a list of random numbers with a duplicate var values = new List<int> { 1, 2, 3, 3, 3, 4, 5}; foreach (int i in values) set.Add(i); //I am not using my prior knowledge of the sample data //the following line would work for any data CollectionAssert.AreEquivalent(new HashSet<int>(values), set); }

Of course, in this example, it’s convenient for me to have an implementation for checking, as well as code for comparing collections (CollectionAssert). But what if I hadn’t? This code will definitely be more complicated than the previous version! And this is a situation when you test your business logic in real life.

Of course, testing the expected conditions in the general case covers more cases - but it is very similar to repeating the logic (which is tedious and useless - you cannot use the same code to test yourself!). Basically I ask if my tests should look like "insert 1, 2, 3 and then check something around 3" or "insert 1, 2, 3 and check something at all"

EDIT. To help me understand, please indicate in your answer if you prefer OPTION 1 or OPTION 2 (or neither one nor the other, it depends on the case, etc.). Just to clarify, it’s pretty clear that in this case ( IntSet ) option 2 is better in all aspects. However, my question relates to cases where you do not have an alternative implementation for verification, so the code in option 2 will definitely be more complicated than option 1.

+8

language-agnostic c # unit-testing

Ohad schneider Jan 17 '11 at 13:32

source share

6 answers

If you have an alternative implementation, then definitely use it.

In some situations, you can avoid re-implementing the alternative implementation, but still check the functionality as a whole. For example, in your example, you can first create a set of unique values and then randomly duplicate elements before passing them into your implementation. You can verify that the result is equivalent to your starting vector, without having to override the sort.

I try to use this approach whenever possible.

Update: In fact, I am in favor of the option "Option number 2". With this approach, there is only one output vector that will allow you to pass the test. With "Option No. 1" there is an infinite number of acceptable output vectors (it tests the invariant, but does not test for any relation to the input data).

+2

Oliver Charlesworth Jan 17 '11 at 13:36

source share

I usually prefer to check the use cases one by one - this works well with TDD: “some code, try a little”. Of course, after some time, my test cases begin to contain duplicate code, so I'm refactoring. The actual method of checking the results does not matter to me as long as it works accurately and does not fall into the testing path. Therefore, if there is a “reference implementation” for verification, things get better.

It is important, however, that the tests must be reproducible and it should be clear that each test method actually tests . For me, inserting random values into a collection is in no way - of course, if there are a huge number of cases of using data / usage, every tool or approach is welcome, which helps me deal better with the situation without lulling me into a false sense of security.

+2

Péter Török Jan 17 '11 at 13:44

source share

According to xUnit Test Patterns , it is usually more beneficial to test the state of the system under test. If you want to test its behavior and the way the algorithm works, you can use Mock Object Testing.

Thus, both of your tests are known as Data Driven Tests. It is usually acceptable to use as much knowledge as the API provides. Remember that these tests also serve as documentation for your software. Therefore, it is important that they are as simple as possible - whatever that means for your particular case.

+1

Mike Jan 17 '11 at 13:41

source share

The first step is to demonstrate the correctness of the Add method using the action diagram / flowchart. The next step is to formally validate the Add method, if you have the time. Then testing with certain data sets in which you expect duplication and not duplication (i.e., some data sets are duplicated and some sets are missing, and you see if the data structure works correctly - it is important to have cases that must be successful ( there are no duplicates) and check if they were added to the set correctly, and not just check for failure cases (cases when duplicates should be found)). And finally, a check in general. Despite the fact that it is now somewhat outdated, I would suggest building data to fully implement each execution path in the test method. At any time, you made a code change, and then you start everything using regression testing.

0

Kirt undercoffer Jan 17 '11 at 14:01

source share

I would choose an algorithmic approach, but it’s better not to rely on an alternative implementation like HashSet. In fact, you are testing more than just “no duplicates” with a HashSet match. For example, a test will fail if any elements do not fall into the result set, and you probably have other tests that verify this.

A cleaner “no duplicate” expectation check might look something like this:

 Assert.AreEqual(values.Distinct().Count(), set.Count());

0

Nicole calinoiu Jan 17 '11 at 14:28

source share

Aliostad · Accepted Answer · 2011-01-17T13:42:40+0000

Basically I ask, my tests should look like this: "insert 1, 2, 3, then check something around 3" or "insert 1, 2, 3 and check something in general"

I'm not a TDD purist, but it seems like people are saying that the test should break if the condition you are trying to test is broken. EI if you implement a test that checks the general condition, then your test will be broken down into more than a few cases, so it is not optimal.

If I test that I cannot add duplicates, I would experience this. Therefore, in this case, I would say that I would go with the first .

(Update)

OK, now you have updated the code, and I need to update my answer.

Which one would I choose? It depends on the implementation of CollectionAssert.AreEquivalent(new HashSet<int>(values), set); . For example, IEnumerable<T> keeps order until the HashSet<T> does this, even if it can break the test while it shouldn't. For me , everything is still higher at first .

Unit testing - algorithm or sample based?

(Update)

More articles: