                        LOC - Lines Of Code

                        A Utility by

                        Matthew J. W. Ratcliff

Among the many buzzwords popular in the circles of "software
engineering" is the term "software metrics".  This is some
mystical magical number that measures quantity and quality
of software.  Such a number doesn't really exist, however.
Until someone comes up with something equivalent to the
"fogg index" for computer software, there is really no quick
and easy way to tell quantity and quality of software.

Some may argue to measure one's software productivity by the
number of lines of code generated in some period of time.
This can be entirely invalid, however. Suppose your
programming team has been developing a major operating
system release that must fit in exactly 256K of ROM.  As the
integration team leader, it is your job to see to it that it
fits.  You find that, due to some poor programming
techniques of some of your coleagues, the integrated result
simply won't fit.  So, you spend two weeks optimizing and
rewriting many major algorithms, adding many enhancements
while generating less code to get the job done.  The end
result is that your operating system now fits in 256K of ROM
and your "software productivity" was NEGATIVE, because you
produced fewer lines of code than what you started with!
Simply counting lines of executable code, or total lines of
text in a source file, just won't cut it. Invariably, the
better programmer will require fewer lines of code to do the
same job as a poor programmer.  Software metrics based
purely on a count of lines of code is simply invalid.

However, tight code doesn't always indicate good code
either.  Some programmers still adhere to the hacker ethic -
bang out code using the most terse constructs and obscure,
instruction cycle trimming, instructions possible.  Although
the code, when finally debugged, runs like a demon, it may
be virtually impossible to support.  You know you are in
trouble when asked to "maintain" a program and you peruse
the source file to find no comments!

What makes quality software?  Studies have shown that the
major cost of software over its lifetime is maintenance
after its initial release.  Seldom is the programmer who
maintains a mature system the same as he who wrote it.
Structured programming methods are important; modular code,
with source files containing logically grouped routines,
helps a lot.  Frequent and explanitory comments go a long
way to helping one maintain his sanity.

In my studies of programming, helping other people with
their software, and maintaining programs from many different
sources, I have found the most readable code to have:

Frequent Useful Comments:
  Tell me what the program does and how it gets the job
  done.  Explain unique algorithms, especially tricks that
  are not obvious. It does NOT HAVE USELESS COMMENTS that
  state the obvious such as:

                inc     ax      ; Add 1 to ax register

Few Global Variables:
  Better yet, NO global variables.  We all earn how to use
  and abuse them by learning BASIC or FORTRAN as a first
  language.  The key to writing horrid spagetti code is
  liberal use and abuse of global variables.  Well written
  programs, easy to modify, invariably have very few, if
  any, global variables.

Consistent Technique:
  There are "different strokes for different folks".
  Everyone has a different programming technique, even
  within the same group of programmers working as a team
  under some agreed "standard". Consistent programming and
  comment style makes the code easier to follow, once a
  newcomer has adjusted to the different technique.  Using
  in line comments in one module, block comments in another,
  no comments somewhere else, creates a confusing jumble.
  Indentation and variable naming conventions should be
  consistent.  Don't use upper case for module variables in
  one file, and lower case in another.  Be consistent and
  your code is easier for someone else to read, easier for
  you to update when you come back to it later.

Prototyping:
  Ada and Pascal require that you prototype your code, that
  is declare a function or procedure before it is used.
  This tells the compiler what inputs the function requires
  and what data type it will return, if any.  Once the
  compiler "knows" this information from a prototype, if you
  make a invalid format call to one of those routines, the
  compiler will catch the glitch at compile time. This is
  MUCH better than experiencing a total system crash at RUN
  time!  In C, prototyping is optional.  However, it is
  strongly recommended that you use full prototyping, and
  have full warnings enabled on the compiler.  Turbo C and
  Microsoft C both conform to the current ANSI standards,
  which includes extensive prototyping and compile time
  checking of all user defined function calls.  A program
  that is not "prototyped" is difficult to decipher.  With
  all the prototyping in separate include files, those files
  act as ACCURATE software documentation.  When programming
  for the DoD, such files are REQUIRED as part of the final
  product, when delivering Ada software.  Unlike software
  documentation MANUALS, the prototype files must be
  correct, or the software wouldn't compile and run
  properly.

The utility presented here, LOC, is a Lines Of Code counting
tool.  Although it cannot analyze your code and verify that
it conforms to any particular technique that might ensure
some level of "quality", it does provide some good
indicators that help gage software quality and quantity.

The utility LOC, Lines Of Code, is designed to give a
detailed report for one or more source files about total
lines of source code, total lines of comments, total blank
lines, and total commented lines of code, along with grand
totals and percentages for each.  LOC can have all its
parameters passed on the command line from DOS, or be
entered interactively.  The language's comment start and end
identifiers must be specified, along with a filename (with
wild cards, if desired).  For example, if you wish to count
all the comments and code in all your .C and .H files in the
current directory, use the command:

        LOC /* */ *.C *.H

The '/*' is the start and '*/' the end comment delimeters
for the C programming language.  Following those
specifications is a list of file specifications.  The only
restriction on the files to be analyzed is that they are all
of the same language.  To analyze all ada source (.Ada) and
specification (.PKG) files you would use the command:

        LOC -- () *.ADA *.PKG

In Ada a comment begins with the double dash character, and
ends at the end of the line of code.  (e.g. The carriage
return at the end of the line implicitly ends the comment.)
To analyze all your BASIC files (which must be saved in
ASCII format) use the following:

        LOC REM () *.BAS

If you use the tick mark (') instead of REM, then the
command would be:

        LOC ' () *.BAS

For assembly language programs, whose comments begin with a
semicolon, use:

        LOC ; () *.ASM


Below is a sample output for some of my programs:

LOC - Lines of Code counter, by Mat*Rat
File            Lines of  Comment   Blank     Commented Total
                Code   %  Lines  %  Lines  %  Code   %  Lines
===============|=====|===|=====|===|=====|===|=====|===|=====
          LOC.C   274  70    75  19    40  10    14   3   389
       EXPAND.C    60  58    25  24    18  17     3   2   103
       EXPAND.H     1   5    14  73     4  21     0   0    19
===============|=====|===|=====|===|=====|===|=====|===|=====
        Totals:   335  65   114  22    62  12    17   3   511
===============|=====|===|=====|===|=====|===|=====|===|=====
File            Lines of  Comment   Blank     Commented Total
                Code   %  Lines  %  Lines  %  Code   %  Lines

LOC right justifies the filenames, and will preceed them
with a path specifier, if one was used on the command line.
LOC is easily ported to other systems.  EXPAND.C is DOS
specific, which handles the expansion of file specifiers
with wild cards.  In Unix, wild cards are handled by the
operating system, and the files are handed to the program
directly.  Rewriting the function Getfilename(); in EXPAND.C
for another system should be trivial.  (I ported it to a
Data General system in about 10 minutes.)

Total lines of code is shown at the bottom of the first
column.  A percentage, relative to the total numbers of
lines (code, comment, and blank), is computed and shown in
the second column.  Total comment lines and the relative
percentage, are shown in the next two columns.  A separate
total is kept for blank lines count and total.  The total
percentage of Lines of Code, Comment Lines, and Blank Lines,
should very nearly total 100%.  (All arithmetic is done in
integer so the program runs quickly.  The percentage round
off error is minor.)  The next two columns show Commented
Code lines and as a percentage.  These lines are also
counted in the Code columns.  This is shown to give you an
indication of how much in line commenting you use.  If you
have a low Comment Lines count but a high Commented Code
count, this simply reflects a different programming style.
The last column is a total, and gives the total number of
code, comment, and blank lines in all the source files that
LOC inspected.  Use this grand total to impress your boss
and friends.

I have run LOC on some very old software that I consider
obsolete, and would rewrite from scratch if ever asked to
support it, and found 2 to 10% comments.  In some more
more recent software, that conforms to the "software
standards" implemented by our programming team, the comment
percentage varies between 30 and 45%.  When whipping up
utilities, that I don't worry much about maintaining, my
comment percentage drops to about 20%, as shown above for
the LOC program itself.   I generally (but not always) find
software, which contaions 50% or more of comments, was
created by someone who didn't know what he was doing.  In
some cases, especially in very small, tight, or "elegant"
code, having more comments than code may be necessary - but
not often.

As a rule of thumb, run LOC on your source files and shoot
for a 30 to 45 percent comment content. Typically, when you
are optimizing code that already works, you will find lines
of code decreasing as comments increase. (At least you
should comment your terse algorithms. Generally, the fewer
lines of code a function requires, as it is optimized, the
more difficult the code is to decipher.)

LOC won't guarantee that your code is modular, or even that
is commented well, only tell you HOW MUCH it is commented.
These measurements do provide valuable feedback about how
much actual code you have written, and how liberally the
code is commented.


Matthew J. W. Ratcliff
32 S. Hartnett Ave.
St. Louis, MO  63136



