Kevin D. Weeks
508 Valparaiso Road
Oak Ridge, TN 37830
(615)483-0416


Glass Box Testing:
Techniques for Preventing Software Bugs


     The issue of software quality is reaching crisis
proportions. Two notable cases of failure in this regard are the
AT&T incident where the phone system was shut down in the North
East for nine hours resulting in millions of dollars of business
losses and the Canadian incident where a computer-controlled,
medical, diagnostic machine killed two people. Although it is
probably impossible, given (most) real-world constraints, to
completely eliminate bugs from a computer program, it is possible
to significantly reduce the number of bugs shipped in a program.
This article presents a number of simple rules and techniques for
identifying and eliminating many of the most common software bugs
during both the development and the maintenance phases of a
program's life.

I first developed my interest in software quality several years
ago when I was assigned the task of writing a control program for
a Cervical Manipulation Therapeutic Bed. This was a device
intended to replace a physical therapist and its specific
function was to move a patient's head in any of three different
axes thus manipulating the patient's neck (cervix.) Now keep in
mind that the patients are undergoing treatment because they've
already been injured. Theoretically it was possible for this
device to permanently paralyse a patient! I did everything I
could think of to make sure the software was bug-free but you
can't imagine how relieved I was when the company making the
things went out of business before they sold any. Looking back
now, some four years later, my testing technique had more holes
in it than a sheet of fan-fold paper.

Typically the programmer who writes a piece of code is a poor
choice for testing that code. "...it is extremely difficult,
after a programmer has been constructive while designing and
coding a program, to suddenly, overnight, change his or her
perspective and attempt to form a completely destructive frame of
mind toward the program."[1] It is the purpose of the tester to
demonstrate that a body of code does not work. A successful test
is one that uncovers an error thus improving the code's quality.

The question, then, is: Given our natural bias as programmers,
how can we successfully test our own code? My solution is to make
the testing process as mechanistic as I can wherever I can. To do
so, I simply follow a set of rules for writing test code and thus
take my own attitudes out of the equation. I also realize that
I'm simply not psychologically equipped to perform some forms of
testing and so, whenever possible, I rely on others for that.


Error sources.
There are five primary sources of software error. These are:

     *) External factors (OS/Compiler/Hardware)
     *) Syntax errors
     *) Logic errors
     *) Design errors (System and Implementation design)
     *) Analysis errors

Each of these, with the possible exception of syntax errors, are
worthy of discussion but I want to concentrate on Logic Errors
since these are most amenable to a mechanistic approach. I define
a Logic Error as:

     A failure, by the software, to perform in the manner
     intended by the programmer.

Please note that this definition implies that it is quite
possible for a function or module to perform exactly as the
programmer intended and still fail to perform as required.
However, as the implementor I'm not responsible for errors
resulting from an incorrect specification. The purpose in making
these distinctions between error sources is not to assign blame
but to refine techniques for ferreting out particular classes of
errors. Logic errors are particularly subject to glass box
testing.

We're all familiar with the term, "black box." This refers to a
device which receives input and produces output without the user
knowing what processes took place in between. For most of us a
photo-copier is black box. A glass (or white) box is a device
where the processing is intimately known. No one knows the
"innards" of a function better than the programmer who wrote it.


Code format.
I am extremely distrustful of embedded, in-line test code. First,
pointer errors are often position sensitive and I would rather
not have them shift after I've decided the code works. Second,
embedded test code makes the target source code harder to read.
Third, decisions (relational tests) are a prime source of errors
in their own right. A statement such as:
     #if !defined(PARTIAL_TEST)
could accidentally remain enabled following a final, hurried test
just prior to release.

To avoid the above problems I place test code at the bottom of a
module with a single conditional - #if defined( TEST ) - on which
all other conditionals depend. By including the test code in the
module I am testing I have complete access to all static
variables and functions. This reduces the need for in-line test
code. Eliminating embedded test code means that pointers don't
move just because they're being observed. (My favorite example of
Heisenberg's Uncertainty Principle.) If you dislike the added
bulk of including the test code with the target code you might do
as a friend of mine does. He writes a separate test code module
which he then conditionally #includes in the source module.


Statement Coverage.
Robert Frost once wrote a poem entitled, "The Road Not Taken." In
testing one wants to be sure every road is taken. This is
referred to as "statement coverage". Since, obviously, an
unexecuted line of code is an un-tested line of code, statement
coverage is where we begin.

I write test statements designed to exercise each path through a
function. Then I use a source-level debugger and simply walk
through the test code and its target using the debugger to
visually verify coverage. This is the way most programmers debug
these days anyway and, so, is no additional burden. However,
during this walk-through I have an ulterior motive. I want to
spot areas of the target code for which I may have failed to
develop effective test cases.

Listing 1 demonstrates simple statement coverage. I wrote test
code to execute both possible paths in the target function. This
example is certainly trivial but there are cases that will test
your ingenuity. Listing 2 is an example of one such. In this case
there are two difficulties. The first problem is that the first
branch if target() depends on the return value from another
function call - calloc() in this case. We can overcome this
difficulty by creating a dummy function who's return value we can
control. The second problem is that the function is called more
than once depending on the results of earlier calls. To solve
this I created a wrapper function called testCalloc() (Listings 4
and 5) which will call calloc() the number of times specified in
a previous call to SetCalloc() and then fail.

Once I've completed a module I use a third party tool such as
Borland's Turbo Debugger to verify independently that my test
code does indeed execute every line of target code.
 
Decision Coverage:
Obviously statement coverage, although essential, is
insufficient. The complexDecision() function in Listing 3
contains a complex decision and so we must make sure we execute
each path through the decision itself. Figure 1 shows the cases
we must test. (The last five test cases may seem redundant but
they're useful for finding erroneous parenthetic groupings.)

As you can see, the first statement in the compare() function
requires eleven test cases. In a real-world situation the number
of test cases can grow nearly exponentially, especially when we
add in boundary tests (discussed next.) I've found that one way
of simplifying the effort is to create a test structure,
testParameters, and then an array of test cases that can simply
be looped through. Again Listing 3 provides an example. I've
defined a structure that contains the input values, the expected
results, and even an error message which serves the double duty
of documenting a particular test case. The use of a test
structure also simplifies adding and deleting test cases as the
function evolves.

Our new requirement, then, is to execute every line of code and
to exercise every decision.

Boundary conditions:
In examining the test cases in Listing 3 you may have noticed
that there are more of them than I listed in Figure 1. This is
because I've combined decision coverage test cases with boundary
condition test cases. A boundary condition is the point at which
the rules governing a parameter's behavior change. For instance,
natural boundary conditions occur at 0 for all integers and at
127, 32767, and 2147483647 for signed integers. Because of this
change in behavior, boundary conditions tend to be weak points in
a program. This is why the test cases in Listing 6 make so much
use of MAX_INT, 0, and -1.

There are, of course, other boundaries. On an IBM PC there's an
address boundary at 64535. Many computers can only address
objects at even bytes. On top of that, the application itself may
impose boundaries. In the example, 15 is such a boundary. To test
a boundary requires three test cases. One case within the range,
one on the boundary itself, and one outside the range. In the
case of an integer's zero boundary we need test cases for -1, 0,
and +1. In the case of the number 15 we're interested in 14, 15,
and 16. Fortunately in testing the boundaries we can usually
presume that all values in the range included and excluded by a
particular boundary pair will behave the same way as our test
cases. In other words, if one and 14 work then it's reasonable to
assume that 2 through 13 will also.

Before disposing of Listing 3 there's one additional point.
Although complexDecision() returns what should be the "c"
variable's current value, I still explicitly confirm it. I never
believe anything a function being tested reports. All operations
and side-effects should be independently verified if at all
possible. If a function repositions the cursor then I check the
hardware for confirmation. If a function writes to disk then the
test code reads from disk whatever was written. When testing you
must always be explicit about the results you expect and then
make absolutely sure those are the results you got.

Tools:
There are a number of tools that can be of great help in testing
your code. I mentioned Borland's profiler above for testing
statement coverage. I've seen ads for other products that provide
statement coverage testing.

In the October '91 issue of CUJ Robert Ward had an article
entitled, "Debugging Instrumentation Wrappers For Heap
Functions."[2] Please, use a memory monitor of some sort. If you
don't like Robert's there have been others published in other
articles. If you don't want to code your own or you want more
sophisticated capabilities there are products such as MemCheck
from StratosWare available. I started using such a tool several
years ago. In that time I've twice performed maintenance on
programs I'd written prior to getting a memory checker. In both
cases I found out-of-bounds memory writes and memory leaks.

The biggest drawback to the type of testing I've described is
providing user interaction for the user interface portions of the
code. Just on general principle it's a good idea to isolate such
code to a few modules and this helps. Ultimately, though, you
need to test that code also. This is problematic. One of my goals
is to automate the testing as much as possible: but if I'm
required to provide input and confirm output then I'll eventually
get lazy and not do it. The best bet here is something like
"Ghost" from Vermont Creative Software or "Test" from Microsoft.
These tools are also invaluable later during the integration
phase for automating regression testing.


Summing up:
Effective glass box testing depends to a large degree on proper
software construction. Design your code, don't hack it. If, in
addition, you design with testing in mind then you'll find the
job easier yet. I know you've heard it before, but let me re-
iterate, don't use global variables! When a module accesses a
global variable the potential paths through that module goes up
significantly. Keep your module cohesion high and the coupling
between modules low. I highly recommend an object oriented
approach even in C.

Implement and test the module incrementally. I write a function,
then write the test scaffold, and then test the function. Once
I'm satisfied with the first function I move on to the next. If I
have a function pair such as SetCursor() and GetCursor() then
I'll implement and test them together (keeping in mind that I
don't trust either function to verify the other.) An incremental
approach makes the burden of writing the tests easier and also
allows one to build the module on a solid foundation. 

As you add functions continue to run the tests for earlier
functions. This is known as regression testing and will allow you
to immediately spot any bugs your newest code may have introduced
into already tested code. It seems like half the errors I see
result from side-effects in previously tested code that wasn't
thoroughly re-tested.

Test each module in isolation. Provide dummy functions for calls
outside of the module so that you can control the results of the
calls. Once the Module has been checked out in isolation, link in
the outside functions and run the tests again. This usually
requires some conditional compilation in your test code but
better there than in the target code.

Glass box testing sounds like a lot of work but it's actually not
that bad. The statistics I've run on my own efforts show that in
a completed module somewhere between 55% and 60% of the
statements are test code. (These numbers are close to those noted
by Marc Retig in his article, "Testing Made Palatable" in the May
1991 "Communications of the ACM".[3]) However, much of the test
code is the same thing over and over with just the parameters
changed. I only spend about 30% of my time writing test code.
However, that 30% produced an estimated 50% reduction in time
spent integrating the modules. I've only written one complete,
non-trivial program using these techniques and don't yet have any
numbers on post-release bugs.

As professionals we need to address the problems of software
quality pro-actively and not reactively. Test, don't debug.



References:
1. Myers, Glenford J. The Art of Software Testing. New York, NY:
John Wiley & Sons, Inc., 1979.

2. Ward, Robert. "Debugging Instrumentation Wrappers for Heap
Functions," The C Users Journal, October 1991, pp 40-47 and 71-
72.

3. Rettig, Marc. "Testing Made Palatable," Communications of the
ACM, May 1991, pp 25-29.

Bio:

Kevin D. Weeks has been programming, primarily (and preferably)
on micros, for over ten years. He has written programs ranging
from a Radioactive Waste Inventory Management System to a control
program for a therapeutic bed. He is currently employed as a
software engineer by Electrotec Concepts, Inc. in Knoxville, TN.
He can be contacted on CompuServe where his account number is
70262,2051.
