Shortcut for chapter specific information

Sunday, May 15, 2011

Jon Bentley's binary search: should we test a program before/after you write a program?

A must read of every programmer is the book Programming Pearl.   If you care, you might have known that Programming Pearl's Chapter 4 and 5 have been a great reminder to every programmers that writing correct code is very hard.    His example is the seemingly simple binary search.  The concept is as simple as children's game : Twenty Questions.  But to write it correctly, it's actually very hard.    There are many correct variations of the algorithms which mainly lie on how the bounds were set (inclusive, exclusive) and termination condition.    For anyone who has look down this algorithm, they will suffer the same of writing incorrect code.

Saying so, I got to admit that I am not such an exceptional programmer and I don't stand on higher moral ground.  In fact, just like many algorithms I have practiced on,  I need to write binary search for 4 times (including pseudocode and once in C) before I got it correct.   In fact, not until I read Chapter 4 of Programming Pearl, I have no way to make sure my implementation (which at the end followed Bentley's implementation) is correct.   This turns me from what Bentley said the 90% group of programmers who got binary search incorrect to the 10% who got the code correct.


Now here's the twist of the story, Joshua











This argument still has issues - one might argue that this is only true in this particular instance of programming.  In other situations, what if there's another difference between the actual program language with the pseudo code?   How do we know?  My take is that this should be ok, because one can imagine that these differences of languages should have only finite number and one can enumerate them before they make a proof. 

I think in this way because binary search gives a very good example where the testing tenet of programming doesn't work well.    In the case if binary search was meant to search N numbers within range R : [p,q]? What would be the test case?    The number of possible combination would be r ^N= (q-p+1) ^N (this allows repetition of the items).    It is impossible to create test cases which cover the range of the program.

This exposed a fundamental issue of the whole tenet of programming by testing paradigm.   Some extremist in the camp suggest that programming can be done by writing a program and then repeatedly test it until it generates the required I/O pairs.   But in real life programming, there are just too many problems one can't handle in this way.

There is another argument against the testing paradigm which I consider it as a good practical reason but a bogus theoretical reason.   This argument goes like this,  even if one can test a program in the range R, what if we later decide that enlarge our range R to R'?  If that's the case, then a program could fail because someone can simply store an R- entry table and because the new range of R' doesn't exist in the table, the program then fail.

As I said this is a bogus theoretical argument, because when one decided to enlarge R, then we are writing a different program.    That doesn't mean the original R-entry table implementation is incorrect.  It means that it is correct with the assumption R is the range. 


In practice though, those who wield this argument will likely to have the last laugh because inputs/outputs of a particular function *always* change in real-life programming.  No one can avoid it and no one can say no to it.



No comments:

Post a Comment