Monday, December 28, 2009

Problems with TDD

This is the place to comment on my essay "Problems with TDD." In that essay I pointed out that there's still very little evidence that TDD is an effective development technique. I explained why TDD was not a good technique for me, and made the stronger claim that TDD is a weak development technique because it does not include essential unit tests. To show that it wasn't because I don't understand or practice TDD well enough, I pointed out worked-out examples done with TDD by two of the leading TDD experts. They contained implementation weakness which were not identified by TDD nor even mentioned by the implementors.

55 comments:

Alan said...

I simply think that TDD, as many others development techniques, should be understood and tried out, and then should apply just where it is useful.

I usually start with a unit test indeed, and let it fail; this is often useful to understand what should be tested and what should be mocked out, then I might go on for a while with a sketch of the implementation (in order to understand which API might be the best), then get back to the test, and so on. Once the implementation is in place and the "development test" (as I usually call it) is ok, I write some more tests to verify the code works as I expect in all situations, especially corner cases.

Of course this depends again on how much effort was put into analysis, whether I'm coding from scratch or refactoring, et cetera.

As you point out it's important to achieve an high code coverage, there's no "absolute need" to follow the fail-code-verify procedure.

Tim Golden said...

We've been using de facto TDD in our Code Dojos here in London Python meetups. And, as I've blogged elsewhere [*], I've found the rigour of test-first development to be constraining -- at least in the slightly artificial environment of a Dojo.

My issue with it seems similar to one of your later points: that it cramps my style; that the test seems to be driving the code. I like to rough out a sketch of code, possibly in the interpreter, possibly in an editor, to get a feel for how I'm going to achieve something. Having to write a test before doing that seems so artificial as to be counter-productive.

Let's hope other peoples' mileage varies :)

TJG

[*] http://ramblings.timgolden.me.uk/2009/10/16/2nd-london-python-dojo-tdd/

Virgil Dupras said...

worst-case scenario: If it's your application specs to handle a worst-case scenario, then there will be a test for it. If you're writing an app TDD-style and want to support a 1 - 2**32 input range for your prime finder, than you'll add a test for it. I really don't see how you came to the conclusion that *because* you didn't use TDD, you could add a "2**31-1" test.

confidence in refactoring: Simply wrong. Since you write your test after the code, how can you be sure your tests are valid? It's very easy to accidentally write a test that never fails (thus, a useless test). With TDD, you've seen that test fail so you know it's a valid one.

API lockdown: You write your tests too close to your components. Tests should ideally be "high-level", that is, accessing your app only through its "public entry points". You give the example of a function you want to remove. If that function was there, it was because it gave your app a certain behavior. You probably don't want to remove that behavior, you just want to give the responsibility of that behavior to another function/class/whatever. If your tests are high-level, you don't have to change your tests because that behavior is tested through public entry points, which seldom change. You are thus empowered with unlimited refactoring potential.

Anonymous said...

You might find this rant about TDD funny

http://littletutorials.com/2008/09/29/thought-driven-development/

blockcipher said...

I think your reasoning actually points to a different problem that can occur with TDD: lack of a specification. If you know what the function is supposed to do and know what it's best and worst cases are, then you can perform TDD, or testing in general, successfully. If you don't, then your testing will be flawed. TDD just allows you to test for these sooner than later, thus allowing you to ensure your code works properly earlier in the implementation process.

However, I do agree with the part about freezing the API. I've noticed at times that I've revisited an "API" that I've written after I start coding because what I thought was clean may be a pain when I actually use it. Of course it could be argued that proper design would take care of that, but who really has 100% assurance that the design is 100% correct without implementing it? You can't think of everything.

Regardless, I'm a big fan of unit testing and I think that it's important regardless of when you do it, though I firmly believe it should be done close to when each function is written. (Before/during/after)

jnicklas said...

TDD done right does lead to (near) 100% code coverage, I've seen this in practice (running code coverage on code I've written using TDD long after the fact). You're saying that refactoring can lead to gaps in coverage, however, that means that in refactoring, you changed the functionality of the code, which is *not* refactoring. Refactoring means changing the implementation *without* changing functionality.

Andrew Dalke said...

Hi all!

@Alan: That's TDD as pedagogical technique. I think my entire essay can be read as agreeing with you. I think people do learn good testing via TDD. I also think people learn good testing without TDD, and that learning TDD adds little on its own.

@blockcipher: it may be that TDD works better when there are more complete specifications. I've never worked on a project where the specifications were better than what you've seen here. I end up talking with my clients and end-users a lot to figure things out as they come up.

@Virgil: "if your application specs to handle a worst-case scenario, then there will be a test for it." The specs in both cases were incomplete. They didn't say one thing about the expected upper bound. Ambiguous specs are common, and it's one of the roles of developer to identify the problem and get it resolved, or at least document the limits. Which wasn't done here, and that's my point. Because TDD is not a testing tool, it doesn't lead to good tests.

@Virgil: How do I know my tests written (slightly after) the code are valid? Easy. If I have any doubt I insert a failure in the code and force the test to fail. I make a slight error in the test case itself and verify that it fails. If the test fails when I first write it (because the code was incomplete or didn't handle the case), then I know the test is being run.

TDD is not the only way to check that the tests are valid. And I don't know how you can say "simply wrong" when I pointed out a set of bugs in code which was developed with TDD-based code, and pointed out on case where the tests were not good evidence that the code actually worked. I did not have good confidence that the prime factors code was correct until I did manual code inspection.

And your comment about my tests being "too close to your components" doesn't follow at all. I'm developing the public entry points, and during implementation I found that entry points I thought were needed don't actually need to exist. Why presume that I wanted to keep the behavior when I tell you I didn't?

Oh, and @Virgil - I do want tests for some of the internal interfaces in my code. In my version of the Prime Factors I had a function which generated primes.That was not part of the public interface, but I wanted to test it and double check that it produced only primes. It was not easy to test that through the prime_factors() public interface because even if the underlying function produced the sequence 2,3,4,5,6,7,..., the prime_factors code would have given the same answer, only in slower time. And writing unit tests which measure performance is not easy. If TDD excludes those sorts of internal tests, for the sake of "unlimited refactoring potential", then I say it's omitting an essential test.

Andrew Dalke said...

@jnicklas: "Substitute algorithm" is a possible refactoring, listed in Fowler. Suppose during testing I find that Martin's implementation is too slow. A perfectly valid refactoring is to put in my 29 or so lines of code which generates prime numbers and tests only those as factors, plus stops searching after sqrt(n).

My sieve doesn't even get to the end of the major loop by the time it generates 2 and 3, yet those are the only test cases in Martin's code. Manual inspection is enough to show that there's no longer 100% code coverage.

As it is, I could have made it worse. For performance I could precomputed the first 10 or so primes, and switch to a prime generation routine if I need larger primes. Using only Martin's final tests, I would never have even reached the prime generation stage.

In other words, I've quite clearly done a refactoring, which changed no functionality (other than performance, which isn't usually part of unit testing), but which now no longer has 100% coverage.

I can give another case. Suppose I need to test if an input string matches any of a large set of strings. I can do "if s in (s1, s2, s3, s4, ... sN)", which is easy to understand but potentially slow because it does N tests.

I can substitute algorithm and replace the "in" test with a state machine built on switch statements: (switch s[0] { case 'H': switch s[1]: case 'i': ... }". That generates a lot of new lines of code, and new possibilities for failure that the original wouldn't have had, like falsely matching a prefix instead of the entire string.

If you do this refactoring then you would be wise to add new test cases for those potential errors, even though again the code functionality does not change.

Refactoring does not maintain code coverage, and therefore TDD as a principle does not maintain 100% coverage. That your project did have very high coverage is product of good development practices, and not directly because of TDD.

André said...

I just wanted to say thank you for having written such an informative and thoughtful post. You mentioned that it was hard to write, and that you spent a fair bit of time on it. I would say that it truly shows and was well worth it.

James said...

One way to fix the bad case time for the Prime Factor Kata on 2**31-1 is the observation that checking candidates larger than sqrt(n) is pointless so computing sqrt(n) before the loop and testing for candidate less than that would mean testing would stop before 2**16.

You would however need to detect that the loop was exitted this way add put n on the list of prime factors.

Joel P said...

I appreciate the thoughtfulness of your post, but I have to disagree with your conclusions. My experience (and I freely admit it is only that) after coming to BDD (a refinement of TDD) after many years without it is that it *dramatically* improves the quality & design of my code. The initial time lost by having to write tests is paid back many many times over during the far longer maintenance period typical of most apps.

Put simply, forcing myself to write testable code leads to cleaner designs and separation of concerns. Writing those tests first makes me think through how a client should interact with my code, as test-driven development means that my tests are the first client. When the code doesn't exist yet, making major chances to its API costs nothing. And some of my most mundane do-I-really-need-a-test-for-this? tests have repeatedly caught very subtle bugs that I'd normally discover long after they went to production.

If I HAD to choose between good practices or BDD, then I'd pick good practices, but that, too is a false choice. TDD/BDD is one of the single best ways a solo developer can enforce good practices, and I've used it to produce and maintain what is by far the highest quality software of my career with a fraction of the stress I'm used to. Need me to make a change today that goes live tomorrow? No problem. 320 green (passing) tests tell me my change probably didn't break anything.

True, it's not magic. Bad tests can give you a false sense of security, and 320 green tests doesn't mean I didn't forget 800 others, all of which I would currently fail. But since most IT shops operate on the "I went through the front-end and it seems to work" QA methodology, it's a million times better than the usual alternative: a cursory look, then wait for customers to complain.

Andrew Dalke said...

Hi James! That's the algorithm I implemented, when I wrote "I implemented the Sieve of Eratosthenes to generate prime factors, and only searched for factors up to sqrt(n)." If Martin had done that is code would have completed in O(sqrt(n)) time instead of O(n) time for the worst case. Since my uses only test primes, its run-time is O(π(sqrt(n)) = O(sqrt(n)/log(n)) where π is the prime-counting function. And assuming my math is right. ;)

Geoff said...

Thanks for a well-written post Andrew!

My main observation on TDD is that there are two different things people mean by it. One the one hand you have "big tent TDD" which I would describe as "the practice of creating automated tests as part of the development process" which I would very much subscribe to and hope that you would too.

What you are criticising is of course something much narrower than that ("Little tent TDD") and I agree with most of what you write.

The idea of BDD (Behaviour-driven Development) goes some way towards "fixing" the problem of it being very easy to create bad tests that prove little and create a maintenance burden.

But fundamentally, there is no substitute for thinking hard and refactoring well, and I don't think being religious about creating tests in a particular order or at a particular scope is productive. The right way to create those automated tests as part of your development process varies hugely depending on the context.

Anonymous said...

Thank you for writing this informative post.

Martijn Faassen said...

I think whether TDD works well depends on the problem. As an extreme example, I find it much less useful for testing user interfaces. In that case I fully agree it tends to freeze things too easy, but what it freezes is the user interface instead of the API.

For APIs I actually don't have that problem at all as I don't feel the APIs are frozen after I write some tests. I refactor tests as much as I refactor code, and that can be done pretty rapidly so I don't feel it slows me down much.

I feel that writing a test (especially a narrative doctest) actually helps me to devise a proper API, as I'm actually using the API already in the tests (and documentation). This helps me figure out what the API should be.

But I don't get the API right immediately at all. I go back and modify my tests and code all the time.

I don't much care whether I test before I implement a feature or afterwards; I do some of each. But I do mix testing with coding, in the sense that writing tests doesn't come at the beginning or the end, but throughout the process. The tests and the code fluctuate all the time.

I am wondering whether you have much experience pairing with someone who practices TDD and feels happy about it?

I suspect concluding TDD works or not is too simple: it probably works for some people and not for others.

hcarvalhoalves said...

Missing the point.

TDD is geared towards developing based on specification. You write tests first because you already reasoned how your component should work on a high-level (client interface, API, etc.). You never write tests *later*, because then you're just going to test for the behaviour you actually implemented, ignoring specification on how it *should* work.

If you don't have specifications and are just tinkering with code, of course TDD is going to be useless.

Ryan said...

TDD (more so BDD) is all about defining the requirements before the implementation. If the prime finder needs to be able to calculate 2**32 then that is a requirement and should be tested. I don't see how writing the test before or after the implementation would change this.

You mention TDD does not leave room for adding tests which may pass, but I disagree here. If there is even the slightest doubt that a test will not pass, then it should be added. Of course if the test passes it should be verified itself, but that applies to after-testing as well.

It seems like many of your points are due to incomplete requirements in the examples rather than what applies specifically to TDD/BDD.

I do agree with your point about freezing the API too quickly. I've been doing TDD for a few years now and still struggle with testing certain interfaces. Mocking/stubbing can help but that has its own set of problems.

This problem usually goes away once I've established a testing pattern. That's when TDD really flows and I get into a rhythm which is extremely productive and results in well tested, quality code.

unclebobmartin said...

Every discipline is a two edged sword. It has to be defined in order to be meaningful; but definition breeds dogma. TDD is an excellent discipline that all developers ought to learn and practice. But, as with all disciplines, you learn to apply it sensibly. (Which is not necessarily the same thing as applying it with moderation.)

Andrew Dalke said...

Ohh, several comments. Thanks for your feedback everyone!

@Joel P: you commented about how BDD dramatically improved the quality and design of your code. It would be nice if there was some sort of study to get more empirical results for that. My limited experience when we've worked on code katas as the local Python User's Group is that the groups using TDD and those not doing TDD ended up with the same quality of code in about the same time. It's much more dependent on experience than choice of testing approach.

In any case, I ask what you are comparing it to. Test later development? Or something like what I use which is neither TDD nor test later, but is much closer to TDD of those two.

As for API costs, mentioned by you and others (Hi Martijn!), that's something which is a personal complaint I have with TDD. I don't say it's a flaw or limitation in TDD as a general statement.

I use example programs which use the API as my guiding principles, and those help me get "cleaner designs and separation of concerns". These are closer to functional tests than unit tests, and I might have no idea what the output to the test programs are until I've worked on the code enough to get something that I can then use to search to find good test cases.

I've also found very subtle bugs through code inspection, load testing, integration testing, performance testing, coverage analysis, and more. While TDD can be one more item in that toolbox, I don't see it as being a powerful item, for reasons I discussed in my essay.

@Geoff: isn't what you call "Big Tent TDD" the same as "unittests"? Where is the test first aspect of that?

What I've read of BDD is covered in big business organizational speak which I have a hard time caring enough to understand.

For example: "Every behaviour of a system should be there because it adds concretely to the business value of the system as a whole."

How do I use the concept of "Business Value" to my consulting work? I usually write software for computational chemists at pharmaceutical companies early in the R&D process, specifically in one case doing toxicology analysis of possible drug leads. This is about 10 years before a candidate might be a drug, and perhaps there will be no successful candidate. There is no direct coupling between what I do and the success of my client.

To me it ends up as "do what the client wants done, and do my best to see that it's aligned with the client's goals."

(Yes, I know that my clients are often big business organizations, but the people I work with are scientists,)

Andrew Dalke said...

@Martijn: I'm going to be strict here and say that TDD is as Kent Beck stated: "a technique for structuring all the activities of development." You can say it's something else, but then it isn't TDD.

Let's call that less strict version "big tent TDD" in honor of @Geoff, but with the restriction that the tests are still done first, before modifying the code.

That's still not what you are doing, since you test "sometimes before and sometimes after." Like I do.

I'm going to call that "good development and testing practices".

Do I have much experience with pairing with someone who does TDD? We've been doing it at our local Python user group during our code katas. That's perhaps 12 times there for 90 minutes each time, plus a few times at one of my client's over a couple of days. That's in addition to a project or two where I tried to do it on my own.

A True Scotsman .. I mean, TDD advocate will say I haven't spent enough time on it to judge if it's useful. I'll point out that 40 hours of meaningful practice, plus reading, plus the research I did for this writeup, is enough to distinguish between a strong and a weak technique.

As for being useful for some and not for others, one of the things I had in a previous draft of my essay was a comparison to strong/static typing vs. strong/dynamic typing. Or for that matter, functional vs. imperative programming styles.

It may well be that you are right, but I suspect the major advantage comes not from the technique but the time and effort spent on becoming a good developer.

Andrew Dalke said...

@hcarvalhoalves: I did not miss the point. I reason by writing code. Is the distinction being that if I work it out on paper then I don't to write tests, but if I use the computer to help me out then I do need to write tests?

And if I've reasoned it out enough on paper, do I throw away all my notes and start again? Certainly not. Then why should I throw away my notes that I used when I reasoned things out on the computer?

As for "TDD is geared towards developing based on specification", that's incorrect. Or if it's correct then it's useless in the face of real-world incomplete and ambiguous specifications. (Which is why people have been pointing me towards BDD, which seems to include that resolution as part of the process.)

I gave examples of two problems where the developers wrote the spec, which were a very mathematically oriented and not open to much interpretation, unlike many business problems.

The spec did not say "must implement fib(0), must implement fib(1), must implement fib(2)" and so on. The spec said "must implement fib(n)", and without guidance of how large n might be.

If 0<=n<=6, then [0,1,1,2,3,5,8][n] is the best, fastest, and easiest solution to the problem. There was absolutely no reason then to implement the general algorithm for that case.

Of course the spec doesn't mean that - the code should handle larger values. But the solutions given would not work for numbers like fib(50) because of exponential growth in the compute time and because of (i the Java case), overflow of the 32 bit integer.

Even in Python, if the recursion problem was solved with memoization, there would still be a StackOverflowException for fib(1500).

Were these limitations defined by the spec? No. It's the job of the developer to identify the ambiguities and point them out.

That TDD does not include this essential role in the development process is one of the reasons why I say that other techniques must come into play, in order to develop the code. And I again say that these other techniques are well enough capable of giving all the advantages of TDD only without the extra requirement of developing the tests first.

@Ryan: since you pointed out the same thing, I'll just point out that none of the specs said that even fib(3) or prime_factors(17) were needed, only to implement fib(n) and prime_factors(n).

If leading TDD advocates cannot come up with a good specification, in examples which are supposed to be exemplar cases of using TDD and which are well studied problems, then what hope have we?

You mentioned "even the slightest doubt" but I think that's too stringent a statement. There's also "reasonable doubt", as well as a cost/benefit analysis approach.

The doubt I mean is doubt in using TDD, in that it may lead to overfitting the data as I described in the essay. To allay that doubt, I add new tests which should pass.

This is part of a methodology used in scientific research, and is not the same as TDD. Doubt, I think, being the essential part doing good science.

I don't think "Doubt-driven development" would make for a good consulting business though. Perhaps I should try? :)

Andrew Dalke said...

@unclebobmartin: Pardon? Where does dogma fit into this?

I don't even see how definitions breed dogma. Following principles without thought or doubt, perhaps, but definitions on their own? A right angle is defined as a 90 degree angle. That's a definition - where's the dogma?

I tried extremely hard to be as empirical as I could. I know this is a contentious topic, so I found concrete examples to back my statements about what I see as general limitations in TDD, and specifically marked those which I consider to be my personal views.

I know that you find TDD to be an excellent discipline. I disagree, but that by itself is a matter of opinion. I used your worked out Prime Factors Kata to show what I regard as evidence of the limitations in TDD, pointing out problems in the solution which TDD did not address but which a more complete development process should have identified.

Part was in not solidifying the specification, which as written does not specify an upper bound or acceptable performance. The solution you wrote implies that it's applicable for values up to maxint, and assumes arbitrary time is available. Try running it with 2**31-1 and see how long it takes to find that that number is prime.

The assumption of arbitrary time is unrealistic for almost every real-world case. That TDD didn't help in finding the specification problems and ended up with a solution that's unrealistic and with no indications that it's unrealistic, indicates limitations with TDD.

A development process which includes worst-case inputs would have at that very moment highlighted that the algorithm being developed was incorrect for the problem at hand. It wouldn't have driven the code for the solution, but it would have highlighted that the algorithm was otherwise being driven to a dead end and some other approach found.

Your solution takes unacceptable long for an input it accepts as valid. It should not be considered an acceptable solution. Some remedies would have been a bound check in the code, a documented statement about its limitations, or a refactoring to a different algorithm which remove those limitations. You mentioned the possibility of the latter in the video I linked to, but said that the more powerful solution was not as "elegant" as your solution.

That I found similar problems in Kent Beck's Fibonacci solution (also in Bernhardt's solution), strongly suggests that TDD does not address these issues.

Anyone who wants to apply TDD sensibly must learn about the limitations in TDD, and learn other ways to solve those limitations.

I happen to believe those other ways are equally effective at producing the solutions that TDD gives, and able to resolve those problems that TDD has. In my essay I highlighted a few of those techniques, which of course are nothing new.

If the problems I identified in the Prime Factors Kata are not actually limitations of TDD but in fact are addressed as part of the TDD process, then would you kindly direct me to something which elaborates upon that, and perhaps also update the Prime Factors Kata to include mention of those issues?

Kevin Baribeau said...

Hey Andrew,

Nice post. Most criticisms of TDD I've read just sem to me like arrogant twerps complaining that they don't want to learn something new. It's nice to see something well researched and thought out.

Thanks for sharing.

For the record though, I think TDD deserves credit as a practice that prevents you from breaking some (most?) of the things that were working in earlier stages of development. I think it's important enough that everyone should learn it, and if they decide they don't like it after practicing it, fine.

I agree with you that TDD is not enough in this sense, but it seems to me like it's a step in the right direction.

In my experience, it also helps getting developers to think about tests. Too many developers today are not used to thinking about testing their code.

Anyway, again, great post.

Anonymous said...

Thanks for your post. It brightened up my morning. Absolutely hilarious :-)

Carlos Ble said...

You just haven't practice TDD enough. How long have you been test-driving your code?
You miss the point as others have said.

Andrew Dalke said...

Comment sweep - Hi to the recent arrivals!

@Kevin Baribeau: Thanks for your support. I will note though that TDD is based on unit tests, and unit tests were also a "practice that prevents you from breaking some (most?) of the things that were working in earlier stages of development." There were also regression tests, which "provide a general assurance that no other errors were introduced in the process of fixing the original problem."

The first test framework I wrote was based on Python 1.x's regression test framework. Those tests were programs which generated output, and the test runner compared the output to the saved expected output. TextTest is a modern and much more capable testing framework in that category.

I think, therefore, that it's an overestimate to credit TDD for that. In addition, there are a number of tests, like performance, load testing, memory leaks, which cannot easily be done quickly and which are therefore outside the sort of tests that TDD generates. These have been known for some time, and are part of the process of making sure that new code doesn't change important characteristics of old code.

@Anonymous: "Absolutely hilarious"? Cool. Though surprising. I wonder which were the funny bits. Should I be signing up for the next comedy improv night? "Two guys walk into a bar. Third one ... ducks!" Err, I should work on that. ;)

@Carlos Ble: I did answer that question in the comments, above. I don't see how it's relevant though. Either I'm correct or I'm incorrect. Switching the topic to me as a person seems like one of those fallacies of logic.

Since I've missed the point, would you tell me where I'm wrong? I showed the basis for my reasoning precisely so people could correct any misunderstandings. Others have said I've missed the point, but no one has yet told me where I made the wrong turn.

unclebobmartin said...

My point was that you have set up TDD as a straw man by assuming that TDD is a stand-alone dogma. Any discipline that you single out like that is going to suffer the same fate. Software development is a collaboration of many different disciplines applied with judgement.

You took pedagogical examples of that single discipline and analyzed them as though they were not pedagogical examples. Any such example will suffer when treated that way because examples are necessarily constrained. They must leave some things out in order to be effective.

For example, one of your complaints about the prime factors kata was that it was hundreds of times slower than the sieve approach that you used. So what? The example was an example, not a solution. (BTW, there are other prime factors examples that use the square root check to increase performance. You just happened to pick one that didn't.)

TDD is just one discipline of many. It's a good discipline; but cannot and should not be used alone.

If you want a complete example, you can look at FitNesse. That's a whole project built and deployed with TDD as one of the central disciplines used throughout it's life.

Joel P said...

Andrew, I agree empirical results would be great, but USEFUL metrics on developer productivity are extremely hard to come by. My personal experience is that for an extremely short & simple project, TDD/BDD may not be worth the overhead. For a very small project, it's probably break even, but as the project size grows, I find that the payback of comprehensive automated unit test coverage rises exponentially.

It sounds like you're used to working for pharmaceutical clients who naturally have a very long timeframe and a liability-driven requirement for extremely high quality. That's great, but I think a more typical development experience is little to no QA budget, everybody wants everything yesterday, and the requirements are often a bit vague because the client isn't even 100% sure what s/he wants. The tools you refer to (code inspection, load testing, etc.) are all good practices, but BDD has been perhaps the single most valuable technique I have found to a) disambiguate requirements, b) document my interpretation of those requirements, c) ensure I meet them, d) guide clean design, and e) protect the code (and me) from regression mistakes throughout the development & maintenance lifecycle.

I can't prove that this is inexorably true for all developers in all projects, but I think all developers should at least give it a shot on one project rather than take what either one of us say on faith.

sampenrose said...

Thank you for this outstanding post. It belongs in a book. Hypothesis I've been mulling: TDD serves a similar social-professional function for our community that the Central Dogma did for biology in the 60's or the Efficient Market Hypothesis did for economics in the 70s: it organizes and focuses communities of like-minded professionals. Because this function is so valuable, it wins fierce advocates among prominent professionals despite having fundamental flaws apparent to less qualified and accomplished people in the field.

If you're interested in this idea I can try to flesh it out with examples, though I suspect you know the history of biology much better than I.

J. B. Rainsberger said...

I don't want to be rude, because you have obviously put a lot of effort into your essay. That said, I need to point out a few things.

1. You base many of your contentions seem to require as part of the hypothesis a kind of TDD done mindlessly. No useful process applied mindlessly leads to good results.

2. You argue that TDD freezes the API too soon. I argue the opposite: when I practise TDD, I find I can more aggressively change code, including the API, in response to learning about the problem and the solution as I design. This is the opposite side of the coin of point 1: when I practise TDD mindfully it amplifies my ability to converge on a sensible design that responds well to change.

3. Your final conclusion, that TDD equates to sensible unit testing plus writing the test first, ignores the key difference between TDD and test-first programming: evolutionary design. Even so, test-first programming reminds the practitioner to write unit tests, and given how much software we write without such tests, I find anything that helps remind us to write them at all quite valuable.

Now for the rude part. I apologize for this, but I have to say it: I have made these arguments for close to a decade now, and others before me have made them for longer. I find nothing new in your objections. If you don't want to practise TDD, then don't practise it, but also don't bother raising the same old objections that others have raised since the XP community repopularized test-first programming in the late 1990s.

Steve Howell said...

Good testing practices help make good code. Automated unit tests, written by the developer and run often during the development stage, is a good testing practice.

TDD practitioners uses those sorts of tests, with a special emphasis on failing test cases that reflect missing code. These test-first tests augment other important tests in the development process.

TDD can easily be mixed with a test-good or test-last approach, and the result is even better "unit testing."

Steve Howell said...

The core value, to me, of TDD is so simple that I do not understand why it gets overblown by both it proponents and its detractors.

TDD basically comes down to this: if you are thinking about how to implement something, then you are also probably thinking about particular test cases that will validate your approach. To the extent that you are going to write a test anyway, there is a small tactical advantage to writing the test before you write the implementation.

It doesn't always make sense to automate it, as you might have a non-automated way to verify your assumptions, particularly if you have an algorithm that lends to equally valid testing strategies, such as visual inspection of a graph or abundant human testers.

In some cases you do not want to automate the test due to concerns that the interface will change soon anyway. Of course, nothing prevents you from automating the tests and deleting them later, but it still a tradeoff.

In some cases you do not want to automate the test in advance, because you already have an implementation in mind, and you wanna get it from brain to keyboard while the iron is hot.

TDD does not guarantee 100% test coverage, nor does it preclude it. Sometimes 100% test coverage should not even be a goal, but if it is, then Andrew is certainly correct that you should think about test coverage holistically, with TDD-written tests as just one possible piece of the puzzle.

TDD does not guarantee that you will produce optimal code, because some elements of good design do not lend themselves directly to test-driven thinking, but instead emphasize other approaches to thinking, like reviewing similar algorithms, or breaking down the problem, or writing pseudocode, or sketching out a diagram on a whiteboard. Often other approaches fit in nicely with TDD, such as breaking down the problem, but TDD is not a prerequisite either.

Titus Brown said...

Andrew, very interesting. I thought this post on unit testing in Coders at Work:

http://www.gigamonkeys.com/blog/2009/10/05/coders-unit-testing.html

was a great discussion of how wide a variation there was in the use of unit testing as a fundamental tool; might be worth reading.

Martijn Faassen said...

@Andrew:

I do write tests very frequently *before* I implement a feature, so I still feel what I do is pretty close to official TDD. But I'm not going to be that upset if I implement a test just after I implement a feature - the difference to me is not very big, as I would have tests in mind as I implement the feature. Then I go and modify the tests and the feature again.

It's all very incremental; I don't write tests for an hour before I write a feature, I write tests for minutes before I write the feature for a few minutes.

I'd say for my library work I am close enough to TDD to be doing proper TDD. For my application work, I'd say I more frequently add tests afterwards or hand-test a user interface. I try to split off libraries from applications as much as possible so I isolate the problem, can to better tests, and gain potential reuse.

@hcarvalhoalves:

I think it's quite possible to do TDD without a clear specification. There is no API specification in most of my work. Often the specification of what should happen overall is quite vague and is basically just some ideas in my head. I frequently discover new ways to approach something doing development as well. And I do TDD.

Andrew Dalke said...

Hi all! Going backwards this time for a change of pace.

@Martijn: My view is that TDD adds little to what can be achieved by other development processes. By this I mean the "little tent TDD" description by @Geoff, where all tests are done first. If some tests are done first then this is not TDD, but it is what Geoff termed "big tent TDD."

If not all of the tests are done first then I can well understand that in the mix of development approaches, some of the techniques I would use (coverage testing) would be less important. My essay was to back my statement that writing all tests first is a weak development methodology and one should work in the big tent instead of the small one.

@Titus Brown: Thanks for that link. I read it when it first came out, but I had forgotten about it. I read it again now and enjoyed it again.

@Steve Howell: "you are also probably thinking about particular test cases that will validate your approach." Validation tests seem to be a testing effort outside of the scope of TDD, which is a development approach oriented towards adding new features.

I mentioned a couple of places which show the TDD test were not meant for validation. The prime factors code don't check the right cases to remove reasonable doubt in the implementation, and the Fibonacci code does not function over what I would expect as reasonable input ranges.

@Steve: "TDD does not guarantee 100% test coverage, nor does it preclude it."

Agreed. I brought it up not because it's a problem or limitation in TDD, but that claims like Beck's "TDD followed religiously should result in 100% statement coverage" are not correct.

His is not an uncommon statement. I easily found these: "100% code coverage is typically a result of using the Test Driven Development methodology." "If you are following TDD, than I assume you must and will be having 100% coverage." "For a theoretical perspective, this should give your code 100% code coverage. ... If you don’t have 100% test coverage, you aren’t correctly doing TDD." "Well, following TDD to the letter (very hard to do at first) you will always have 100% coverage because not a single line of production code should be written that doesn't respond a question asked by the test code."

Because I think this is a common misconception of TDD, I brought it up and pointed out that refactoring does not maintain 100% test coverage.

@Steve: "TDD does not guarantee that you will produce optimal code... " Well, I don't think anyone development process makes that strong assertion, and if they do I've got some NP-hard problems I'd let them to work on. ;) My thesis is that those other approaches to thinking are in general more powerful than TDD as a way to implement code. While TDD may be a useful technique for some, it's not one of the first approaches I would reach for, and it's weaknesses should be described.

@Steve: "These test-first tests augment other important tests in the development process." I can understand the intent of that sentence. Are there specific tests which wouldn't have been added by the other processes? I don't think so, but that's not the only benefit TDD could have. It could enrich the test ordering so the most important tests are first, except feature prioritization is handled elsewhere in the process. It could improve developer confidence, except that I've argued that the TDD tests aren't designed to validate the software and the developer shouldn't at that point have confidence that the code works, other than the limited test of "it passes the unit tests."

That's why I say that TDD adds little to the development process.

Andrew Dalke said...

@J. B. Rainsberger: I never said that TDD was done mindlessly. I said that other development practices must be done to handle the limitations in TDD, and that those other practices are powerful enough on their own that doing all developer tests first adds few benefits. If the goal includes 100% coverage, then do coverage testing. Good API development? Do interface testing. Algorithm implementation? Do worst case analysis. Strong validation? Include edge case analysis, fuzz testing, etc.

I did write that freezing the API was a personal complaint with TDD. I prefer techniques from user interface development - mockups, and prototypes and user-testing - to do multiple iterations of the API before I write testable code. I know others will have different styles in how to do this.

@J.B. Rainsberger: "the key difference between TDD and test-first programming: evolutionary design." I had to think about about this before I gained some insight.

Evolutionary systems are complex, haphazard systems: gloriously marvelous and at times throw-your-hands-in-the-air incomprehensible. Viewed evolutionarily, TDD uses mutational changes by the developer (which must be relatively small for the evolutionary argument to work), and the selective pressures of the test cases which check for viability.

I complained that the resulting programs were not solved in my preferred way, which would have required more forethought and global analysis. But evolution does not allow global optimization, only local optimizations and therefore my preferences are irrelevant. Evolutionary design also doesn't guarantee 100% code coverage, as our DNA shows.;)

Using evolution as the basic design principle, I'll recast my complaints. Is the code which results from TDD only viable inside of the specific environment of the unit tests? How well do the selective pressures of the test environment reflect those of the deployment environment? With an evolutionary approach then these questions must be answered before you can deploy, but weren't even addressed in the TDD cases I mentioned. I still hold that the evolutionary pressures of TDD are for the most part covered in other techniques.

Somehow I suspect you mean "evolution" in the sense of "improvement over time" and not the biological sense of "changes over time." I deliberately chose the biological one because the result was more insightful to me. "Improvement over time development" would be circular logic, since we both want to improve our software.

@J.B. Rainsberger: "I find nothing new in your objections." I thought I added two or three new things to the topic. One was to point to a summary of empirical research done in the effectiveness of TDD done since the 1990s, but that justification is a bit trite. Another was a specific objection to the widespread view that "TDD followed religiously should result in 100% statement coverage." It's simply and provably not true, but I haven't seen that mentioned in my readings of TDD.

The third is a written evaluation of TDD's effectiveness at the code level by comparing what should be exemplar TDD-based solutions to solutions done through another means. Effectiveness here includes lines of code and fitness to the spec. This sort of analysis is not new. I can only say I've never seen it done with TDD. Could you point me to some examples?

Perhaps I only mentioned the "same old objections that others have raised." I don't know the literature enough to be definite, but in any case you're setting a very high bar. The next time I see a blog post or essay which describes the basis of TDD, and adds nothing new to the description, should I comment that it mentions the same old design method that others have described years previous?

Andrew Dalke said...

@sampenrose: Thank you for the compliment, but there's no way I'm writing a book about this.

I didn't realize until I researched it now that using the word "dogma" gave Crick a lot of problems. From Wikipedia's article on the Central Dogma:

"I just didn't know what dogma meant. And I could just as well have called it the 'Central Hypothesis,' or — you know. Which is what I meant to say. Dogma was just a catch phrase."

I just read Crick's "Central Dogma of Molecular Biology" from Nature (1970). It was a pleasant read and much clearer than my memories of learning it the first time, before I knew much about molecular biology.

The paper does "make four points about the formulation of the central dogma which have occasionally produced misunderstandings", but I don't know that it had, as you say "fierce advocates among prominent professionals despite having fundamental flaws apparent to less qualified and accomplished people in the field."

Honestly, other than that paper I don't know much about the social history of the idea at all. I think the term (by accident) captured people's attention, and casting the problem in terms of information transfer between three different classes of sequence-encoding molecule made it a simple reduction of the problem.

As for the Efficient Market Hypothesis - having just finished listening to a wonderful public lecture podcast from the University of Bath by Paul Ormerod titled "Have Economists gone mad?" on the complete failure of macroeconomic modeling, I think I'll just as well not learn more about the viewpoints of economics.

Andrew Dalke said...

@Joel P: I agree that automated unit test, code reviews, regression tests, coverage tests, load test and so on are important. There's a cost/benefit ratio to them of course, but I think that's obvious to anyone. What I've tried to show is that TDD as a process is not meant to produce those sorts of unit tests. It does not lead to 100% code coverage and the produced tests do not do a good enough job on their own to avoid reasonable doubt that the code works.

@Joel P: "pharmaceutical clients who naturally have a very long timeframe and a liability-driven requirement for extremely high quality."

Oh, you should hear the laughing on this side. The work I do is all early lead discovery. The liability issues you're talking about are 5-10 years downstream of what I deal with. The software I work with was primarily developed by scientists with some programming training, and with almost no idea of version control, testing, software design, or usability.

Testing is mostly done empirically - does the output look right, are their good correlations with the expected results, are the outliers understandable? These are the same techniques used to spot if other scientific equipment isn't working. Plus, the algorithms themselves are often statistical in nature, making it hard to know if the answer is truly correct in the first place.

I'm conjecturing here, but I think it works most of the time because many of the algorithms are supposed to produce mathematically continuous results. Most programming errors are of a discrete nature and will break continuous surfaces. One notable bug on software friends worked on caused a very occasional problem when an angle was 180 degrees. That bug was detectable because the simulation should have preserved energy but ended up gaining jumps in energy over simulation time.

One of my favorite bugs I tracked down was software which stored the formal charge (which goes from -2 to +2) in a C 'char' field, since it only needs a few bits. Not realizing that on IRIX the char field was unsigned, so the atoms had charges of 0, +1, +2, +254, and +255. That wasn't caught until the regression tests when I ported the code over to Linux, which uses a signed char, where the differences were only visible in about 1 out of every 100 molecules. That's going to be below the error rate in the downstream models, in a seldom used calculation, so wasn't caught earlier.

In other words, I haven't worked on a group with a dedicated QA person in a long time. The environment is more "no QA budget, everybody wants everything yesterday, and the requirements are often a bit vague because the client isn't even 100% sure what s/he wants."

My personal methodologies derives from user-centered development (when I first learned about XP I complained that it was entirely too developer oriented), and the best practices described in McConnell's "Rapid Development" (1996). Reviewing the Wikipedia page for Rapid_Application_Development , I can see that I'm strongly in the "RAD" camp, including how I wonder how its cons look like pros to me.

When I look at BDD I see similar ideas, only with corporate business words that I don't understand. I'm told I'm looking at the wrong sources to understand BDD. Can you point out one which is clear, since the terms "stackholder", "business value", "outside-in" and "feature injection" either don't make much sense to me (such that I don't know how to apply them in my work) or they seem like a relabeling of existing concepts with new words, apparently because using different words is important?

Oh my. I just researched that. 'Behaviour Driven Development (BDD) grew out of a thought experiment based on Neuro Linguistic Programming (NLP) techniques.' If that's the case then I think that aspect is based in a load of malarky, which will make it harder for me to accept the rest of BDD.

Andrew Dalke said...

@unclebobmartin: I did not say that TDD was a stand-alone dogma. I said that other techniques must be used to address the limitations of TDD, and that these techniques on their own are sufficient to supply most of the advantages that TDD gives.

I used pedagogical examples because if they don't end up with good results then why not choose other techniques which would give better results, if only in documenting the limitations or clarifying the spec?

As a point of pedagogy, I would show or at least mention how other techniques would fit into the process. For example, show how a new test case would require a more complicated "substitute algorithm" refactoring, or how and when to add tests which are expected to pass, in order to verify that the algorithm was driven in the right direction and got to the end.

You mention you've also implemented a solution using sqrt(). If you have it written up then I would like to read about the process, precisely because nothing in the TDD process seems to drive that choice. The three-line algorithm does work across the entire input range, excepting considerations of time, and timing tests do not seem to be a strength of unit tests of TDD.

(As an estimate, adding the sqrt() for a max signed int32 means 46,000 tests while sqrt() and checking only primes means roughly 4,800 tests. Most people won't worry over that factor of 10, and I would have only been chagrined because my Python code used an unbounded integer type which would have scaled over a range that was likely excessive and also suffered from implementation-related slowdowns for larger numbers.)

You say you left out the details to be more effective, but in the video I linked to, you made no mention of limitations in the prime factors solution. You did comment therein that the three line solution was more elegant than using a sieve to generate and test only primes. For someone who knows the limitations of the algorithms you chose, that solution and comment without extra details is detrimental as a means of showing how TDD is a useful development methodology.

I do see that I've been saying "solution" here and you clarified it as "example", by which I think you don't mean "an example solution" but "an example of using TDD." I did get the impression that it was meant to be considered sufficiently complete as to meet the spec requirements, which would make it complete enough for my purposes.

I took a brief look at the FitNess code but it will take a while to digest, and as my preferred language by far is Python, I'll have to think about how to make an appropriate comparison. For example, it includes code to implement an HTTP server, which I would rather base on existing code.

As a start, I looked at the HTTP handling protocol, which is the part of the code I would know the best. I noticed that FitNessExpediterTest isn't following the HTTP/1.1 spec. It sends as a complete message "GET /root HTTP/1.1\r\n\r\n".getBytes() but the spec says "A client MUST include a Host header field in all HTTP/1.1 request messages". It then checks for a 200 result but the spec says that in this case "All Internet-based HTTP/1.1 servers MUST respond with a 400 (Bad Request) status code to any HTTP/1.1 request message which lacks a Host header field."

Viewing fitness/http/Request.java I see that that check isn't done, and I also noticed it doesn't handle multi-line headers. I do suspect that the latter is rare in actual use, and I can't think of any realistic cases which would arise for your server.

It is quite well and true that these are detail points which say nothing about the larger code and the intent of FitNess. I started there just because that's the part of the code I could most quickly understand, and it has an external specification I can refer to.

Steve Howell said...

Andrew, your critiques of the Fibonacci and Prime Numbers example do not really strengthen your case against TDD, as many TDD practitioners will just view the examples as TDD poorly done.

As for test coverage, your assertion that refactoring can undermine 100% coverage in a TDD regime is correct on one level, but that is easily fixed by refactoring the tests themselves. Refactoring tests while you refactor implementation would be a good technique to add to TDD. It is such a good idea that people have already have thought about it.

Using TDD 100% of the time is probably wasteful. Using TDD 0% of the time is just downright foolish.

All your arguments that the benefits of TDD are easily replicated in other paradigms just point out that TDD strives for benefits that are mutually agreed upon.

Instead of bashing TDD in a long, rambling essay, you should just write a shorter, more concise essay about the techniques that work well for you.

Steve Howell said...

Andrew wrote:

@Steve: "These test-first tests augment other important tests in the development process." I can understand the intent of that sentence. Are there specific tests which wouldn't have been added by the other processes? I don't think so, but that's not the only benefit TDD could have. It could enrich the test ordering so the most important tests are first, except feature prioritization is handled elsewhere in the process. It could improve developer confidence, except that I've argued that the TDD tests aren't designed to validate the software and the developer shouldn't at that point have confidence that the code works, other than the limited test of "it passes the unit tests."

My response:

TDD does not provide any tests that could not be replicated by another process. The main benefit from TDD is not what tests you write, but when you write them.

TDD can be helpful to drive feature prioritization, but when it fails at that or only augments better techniques, all you are saying is that TDD is not the best tool for feature prioritization. Which is fine--the core values of TDD are not feature prioritization.

Your assertion that TDD tests are not designed to validate the software is just downright ludicrous.

Your assertion that developers should only be confident in code to the extent that it passes the current tests is absolutely correct. But it does not contradict the TDD methodology in any way. TDD practitioners who are not confident in their code should write additional tests. There is nothing in the TDD bible that says you should stop writing tests for an implementation that you do not like. In particular, if performance is your problem, that can be easily addressed by writing a failing test in TDD and making it pass.

I think you are making some valid points about software development in your essay, but they are all obscured by your strawman arguments against TDD that just seem naive to anybody who has practiced it longer than you have.

TDD is not a panacea. It's just a technique/discipline that you can add to your mix and be a better programmer. If you use it in only 5% of your development, but you choose that 5% wisely (find TDD's sweet spots, understand the problems it is meant to help for, etc.), then you will realize the benefits of TDD without any of the baggage.

Andrew Dalke said...

@Steve Howell: "as many TDD practitioners will just view the examples as TDD poorly done."

Then why is the Fibonacci example used in Kent Beck's book on the topic? If he can't do TDD properly then how can anyone else?

@Steve Howell: "Refactoring tests while you refactor implementation would be a good technique to add to TDD."

To point out, my essay is about current TDD not about changes to it's definition.

@Steve Howell: "Using TDD 0% of the time is just downright foolish."

That's an opinion, and from what I can tell it's empirically unsupported. Some well-known programmers with widely used software disagree with your opinion.

Lack of testing is foolish. There are many types of testing. Are you someone like Beck who says that TDD is not a testing methodology but only a development methodology? In that case TDD isn't even meant to end up with good tests, which is one of my complaints. Or do you consider TDD to be a testing methodology, in which case I would like to know how to introduce the other sorts of test which must be done to verify program correctness.

@Steve Howell: "TDD strives for benefits that are mutually agreed upon"

I never disagreed with that. I said that it's more important to focus on those benefits than on TDD, and that other techniques are more powerful than TDD in achieving those benefits.

Take for example the ability to refactor with good baseline confidence. That's a listed benefit of TDD, and also a benefit of coverage analysis. Watch the Google Tech Talk with author of SQLite from some years ago and you'll clearly see him attribute high code coverage to the ability to replace large chunks of code and have a good baseline expectation that the new code works. Not high confidence - coverage development says that the new code also needs coverage analysis. TDD say nothing about coverage testing of refactored code so can lead to code that does not have 100% unit test coverage.

Which is why I say that code coverage is a more powerful concept than TDD, at least in regards to the benefits of being able to refactor.

@Steve Howell: Instead of bashing TDD in a long, rambling essay, you should just write a shorter, more concise essay about the techniques that work well for you.

You should see all the things I threw out from the earlier drafts! Out of curiosity, I checked the essay lengths in Joel Spolsky's "The Best Software Writing I" based on the page starts. (I make no claims about my essay being best of anything!) Bearing in mind that the page counts round up to the next integer, the average essay length is 10 pages, and the median is 8. My 8 pages should not be considered a long essay.

I also don't think I bashed TDD. I said there were problems with it and it was a weak methodology given that other methodologies which are needed to complement TDD's weakness are sufficiently powerful on their own. That's a mild statement which is hardly bashing.

The techniques which work well for me are nothing new and have been around for decades. Simply listing them would not be helpful.

Instead, I've mentioned a few of them in previous essays on my site in the context of how to use them to solve problems. My essay titled "Parsing by hand" shows how I used coverage analysis to improve my unit tests. Interestingly, I did start by writing 6 unit tests first, but I did that as acceptance tests and not as TDD. Elsewhere I wrote essay include usability considerations, performance tuning, and algorithm analysis.

(Blogger has a 4K limit in comments, so I'll continue in the next comment.)

Andrew Dalke said...

@Steve Howell: The main benefit from TDD is not what tests you write, but when you write them.

I looked through the rest of your comment but didn't see what the benefits of TDD are supposed to be. Timing of when to write code is not benefit in its own right.

The benefits of TDD are supposed to be 1) better test cases, 2) easier to refactor, 3) increased confidence that the resulting code matches the specifications, 4) shorter development time, 5) more maintainable, 6) faster iteration cycles, 7) YAGNI/reduced gold-plating, and things like that.

I happen to believe that 1) is not in TDD (I gave examples), 2) is better done with code coverage (I gave examples), 3) is not part of TDD (I gave examples, and yesterday pointed out that FitNess's web server doesn't match the HTTP spec), 4 and 5) have very little empirical evidence, and 6) is a project management goal, and not a goal from TDD.

( 7) seems to be outside the scope of TDD, since it's up to the developer to interpret the spec and come up with the cases. I thought the prime factors spec implied largish numbers, and supported up to about 2**34 or so, but had I been really into the mathematics I could have written code to handle beyond 2**64, while the customer may have only wanted to support numbers up to 100. So I don't see YAGNI as part of TDD. )


@Steve Howell: Your assertion that TDD tests are not designed to validate the software is just downright ludicrous.

In my essay I gave examples of three TDD programs, meant as exemplar descriptions of TDD done right, and I showed how they were incomplete from a validation standpoint. Even if you exclude timing considerations, the test cases for the prime factors program were good enough to drive development but not good enough to validate the result.

I quoted Beck and can quote others that TDD is a development methodology and not a testing methodology, in which case validation testing is by definition outside the scope of TDD.

In my essay I pointed out that in a TDD-based process there's no place which discusses how to add validation tests which are expected to pass, given that TDD says the new tests must fail, then the code must be changed. I also pointed out that some tests must be done after development is complete, in order to safeguard against over-fitting to the input data. And if those tests fail then they must become part of the development test suite. Yet this type of validation testing is not part of the TDD cycle.

I don't at all see how my conclusion is ludicrous.

@Steve Howell: "TDD is not a panacea" ... "If you use it in only 5% of your development, but you choose that 5% wisely (find TDD's sweet spots, understand the problems it is meant to help for, etc.),"

Just to assure you, I never said that anyone claimed that TDD is a panacea or a dogma or anything else. My statement is only that it's a weak methodology, and the other methodologies which are needed to complement it are sufficiently powerful in their own right to get the advantages attributed to TDD.

As to your "find TDD's sweet spots" comment, my essay was meant entirely to point out places which were not TDD's sweet spots. How else do you learn where to focus that 5%?

I've seen a few places which point out application domains where TDD is not useful, but these are too high level. Just because the application domain isn't good for TDD doesn't meant that parts of it are.

Do you have pointers to some place which describes implementation aspects which are not appropriate for TDD? For example, I think it's fair to say that achieving 100% coverage is not a sweet spot for TDD. What other place makes that observation?

Kariem M. Ab El-Fattah said...

IMHO and to me using TDD is building a framework of trust for long term maintancence, you make modifications trusting that all the cases -YOU THOUGHT IN BEFORE- won't break.
It never ment to make me a bug free code (Because bugs come from not thinking about a condition or ignoring it because it won't be likly to happen and that is human :)).
Also to me writing tests is for us developers who in the end of the project have to situiations either to be on the stress of ending it to deliver in the deadline or in the hell of support to the project, and those two conditions will never enable you to write good tests, or even you write a last second modifications that will break other code because you did not write the tests first :)

Thanks

Andrew Dalke said...

Hi Kariem M. Ab El-Fattah!

I can't tell from what you've written is that you think the acceptance tests should be written first, which is a different idea than TDD where the tests are specifically written to drive the development process.

For example, Acid2 can be seen as an acceptance test written well before any code is written, but it can't be used as part of TDD.

You also wrote about how at the end of the project there's too much pressure to deliver code instead of developing tests. I did stress that the opposite of test-first is not test-last. It includes test-during.

It seems that your purpose in writing the tests first is to give yourself the time to write the tests at all. That feels like a passive-agreesive response to the real problem. If you need time to write necessary tests then part of being a professional is making sure that you have the time, and fitting those needs into the project goals, and not by refusing to write code until the tests are ready.

Finally, I pointed out a set of tests which TDD does not address and which must be done after the development is done. Are you avoiding range coverage, acceptance testing, load testing, and integration testing?

Anonymous said...

First I have to thank you putting so much thought into the topic. As a young and rather inexperienced developer I got attracted and addicted to TDD some while ago. As you state that TDD is a educational technique, that is definitely true for me. Your essay made me stop and think anew and I clearly see the point in your arguments against TDD. Yet I'm not willing to just abandon TDD but rather try to extract an essence where it can be still helpful (at least for me).

Your point is that TDD is educational and I see this as an argument for introducing TDD into a team. Discipline under pressure works best for most people by following artificial boundaries set from above or better, set 'first'. Thought Driven Development is a high moral standard and would not put too much trust in morality when it comes to worst case scenarios.

You say that TDD makes your code 'overfit' to the tests. I experienced this to be much better than code being too hard to test so that it has to be refactored first (without the confidence of having it tested of course). This can be avoided by having 'testing in mind' while writing the code. But that's the discipline thing mentioned above.

Code you write in the first place is often better structured than later attached fixes and extensions, especially when there is time shortage. When I start with design by writing a test, that tests gets itself refactored naturally into a DSL for describing the specific unit being tested. And even if that test does not give me any 'real' confidence in my codes behaviour, it does me a great deal in applying new tests easily and conveniently when I come to exploring the whole problem space of my solution.

Lance Walton said...

It's not clear to me whether some commenters here (including the original author) think that TDD advocates writing *all* tests before *any* production code is written. If that is the case, it is incorrect.

Also, the 'big tent' versus 'little tent' TDD makes no sense to me.

TDD advocates writing a single test, then implementing sufficient production code to make it pass, then refactoring, then repeating the sequence. This cycle is of the order of minutes long.

Some people like this. Some don't. I know this is surprising since there's so much agreement about how to do everything else in the software development world, but that's just the way it is...

Regards,

Lance

Andrew Dalke said...

Thank you for your responses, Anonymous and Lance!

Anonymous? I've never heard TDD called "Thought Driven Development" before. If you think about it, couldn't any approach be thought driven? ;)

More seriously, I didn't say that TDD is educational. The closest I said was that it's a weak approach and the other techniques which you MUST do cover most of the advantages of TDD; those being problem analysis and code coverage. Those are the techniques I would rather have on a team.

And Lance? I certainly don't think, and I doubt that anyone else here either, thinks that TDD advocates writing all tests first. I even defined what I mean by TDD by quoting Wikipedia's article "writes *a* failing automated test case".

I know full well there are people who love, or hate, or go "meh" about TDD. My main point is that TDD is incomplete as a development approach. The (I assume) exemplar examples I referenced from Beck, Martin, etc. contain problems which are not found by the tests used during TDD.

I assert that the analysis to identify those bugs in and test for their absence is more effective than just doing TDD, so that TDD is a weak contribution to a development project.

Geoff's comments about big tent/little tent are ways to distinguish between "pure" TDD, where all tests are written first, vs. a style where tests are part of the development process, and not deferred for a later, post-development ("test-last") stage. I see it as a way to soften the strict requirements of TDD as used by Beck and some others.

Cheers!

KolA said...
This comment has been removed by the author.
KolA said...

Virgil Dupras, you said:
"Since you write your test after the code, how can you be sure your tests are valid? ... With TDD, you've seen that test fail so you know it's a valid one."
Of course you saw it failed using TDD because code that could've make it pass simply wasn't.
I can "prove" virtually any test written after code to be useful in the same manner as you do: modify already implemented method under test to return wrong value, run test and see it became red - voila, it's useful! Clever, huh? :)

KolA said...

@Ryan,
"It seems like many of your points are due to incomplete requirements in the examples rather than what applies specifically to TDD/BDD"

Isn't incomplete/ambigous/ever changing requirements is a reality of 95% software projects? Isn't this is something TDD claims to address?

No methodology can replace brain. TDD/BDD is not exception.

Andrew Dalke said...

@KolA, after 4 years I suspect most aren't interested in following up, no matter how valid your points may be.

KolA said...

Sorry, I didn't ment to spam and don't expect reply. Your article is at the top in google for "TDD crticism" so I just left my 2 cents for those who will read it later (if you have no objections)

Ali Saim said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.