It is well known that Bayesian classifiers are an effective way to filter spam. They may be concise enough (ours are only a few hundred LoCs), but all the main code must be written before you get any results at all.
However, the TDD approach provides that only the minimum amount of code to pass the test can be written, so the following method signature is provided:
bool IsSpam(string text)
And the following line of text, which is clearly spam:
"Cheap generic viagra"
The minimum code size that I could write:
bool IsSpam(string text) { return text == "Cheap generic viagra" }
Now, perhaps I will add another test message, for example.
"Online viagra pharmacy"
I could change the code to:
bool IsSpam(string text) { return text.Contains("viagra"); }
... etc. etc. Until at some point the code becomes random checks of strings, regular expressions, etc., because we developed it, and did not think about it and wrote it differently from the very beginning.
So, how should TDD work with this type of situation when developing code from the simplest possible code to pass the test is the wrong approach? (In particular, if it is known in advance that the best implementations cannot be trivially developed).
source share