
          Mark Lawrence Answers Questions About the BrainMaker Software
                                   April, 1990

    We are frequently asked many technical questions about our products and
    about neural network technology in general.  Here, we answer these
    questions in depth.  The bulk of this material was printed in the
    May/June issue of PCAI in the Vendor's Forum.

    QUESTION: How did you get started in the neural network industry?

    We've been writing and selling engineering software since January, 1985.
    Half of our people are Caltech graduates.  We got started in neural
    networks when Terry Sejnowski of NetTalk fame came to Caltech as a
    visiting professor.  Doctors Sejnowski and Hopfield taught courses and
    gave lectures where they made compelling demonstrations of neural
    network technology.  We decided to write BrainMaker to bring this
    technology out of the lab and into the office and home.

    QUESTION: Why is BrainMaker the most popular neural network development
    system?

    BrainMaker is the #1 selling neural network development system in the
    world, with over a 50% market share of users.  We believe this is
    primarily because our programs are application-oriented and easy to use.
    Our documentation is the most complete and comprehensive - we have spent
    about 5 man-years on BrainMaker software, and about 4 man-years on the
    manuals.  Also, BrainMaker is 10 to 100 times faster than any other
    neural network simulator and converges where others fail.  We have a
    complete line of products from the introductory level to the
    professional level as well as neural network chip support.  We offer two
    completely self-contained versions of our neural network program at
    competitive prices - BrainMaker ($195) and BrainMaker Professional
    ($795).  BrainMaker Professional handles larger networks (8,000 neurons
    per layer verses 512), more memory (Professional supports EMS), includes
    a run time system with source code, and has built in graphing
    capabilities for on-the-screen, printing, and plotting.  Both
    versions support Lotus(R), dBase(R), binary, and ASCII files for data
    input.  We also offer a variety of optional design tools, including our
    proprietary Hypersonic Trainer.  That's it for the marketing hype.

    QUESTION: What makes BrainMaker so fast?  How fast is fast?

    Although application development time is generally shorter with a neural
    network than with traditional programming techniques, training a neural
    network sometimes takes a considerable amount of time.  Some of our
    customers' applications have been known to take from 3 hours to 5 days
    to train.  Yet compared to other neural network software working on the
    same problems and taking from 1 day to 2 months (if ever) to train,
    BrainMaker is fast.  This is for two reasons: our back-propagation
    engine runs in integer arithmetic and we train with fewer iterations.
    This advantage has nothing at all to do with the hardware, although you
    can get even greater speed performance by using our optional C25
    Accelerator board.  Unardorned BrainMaker will run at up to 500
    connections per second on a 386 PC.  The Hypersonic Trainer (a
    proprietary training algorithm) often converges in a fraction of the
    time of back-prop.

    QUESTION: Why is integer arithmetic so hot?

    The most speed improvement is gained by using integer arithmetic for all
    our calculations, instead of floating point.  In converting back
    propagation to integer, we realized several benefits.  Integer
    arithmetic is considerably faster than floating point arithmetic.  Each
    training pass is 10 to 100 times faster than a comparable floating point
    network.  Integer arithmetic is exact - there is no rounding of results.
    Finally, converting back-prop to integer arithmetic brought out
    fundamental problems with the back-prop algorithm which, when fixed,
    provided a better algorithm.

    The first time we ran back-prop in integer, the algorithm did not
    converge.  Upon investigation, we found that the multi-dimensional
    gradient descent technique used in back-prop had been applied
    incorrectly.  The resulting problems are masked by floating point
    arithmetic so that it appears to work, but in integer the problems are
    not subtle.  When we fixed these problems in integer, we found our
    back-propagation engine converged with great reliability.  When we used
    the same techniques but ran in floating point, we found the networks
    converged much more quickly than the original floating point method.
    Because of our techniques, our back-propagation algorithm uses 3 to 30
    times fewer passes through the data to train a given network.
    BrainMaker's speed comes not only from the inherent speed of integer,
    but also from using far fewer passes.

    QUESTION: What do you know about the "local minima" phenomenon?

    Back-propagation is often accused of getting trapped in local minima.
    We don't believe in local minima.  Of course, sometimes BrainMaker
    cannot succeed at learning a particular problem perfectly, but it never
    gets stuck in a local minimum.  Let us explain this assertion.  Imagine
    skiing downhill - you keep twisting and turning until you get to the
    bottom of the hill.  Back-propagation works via gradient descent, which
    is analogous to the downhill skier.  A local minimum would correspond to
    a dried up lake bed in the mountains.  If you ski into the lake bed, you
    will stop coasting at the lowest point of the lake, and never make it to
    the bottom of the mountains.  We notice four things about being stuck
    there: we are in the mountains;  we aren't going anywhere;  there's no
    direction we can downhill ski;  and we're in the wrong place.

    So how does this relate to a neural network?  In back-propagation, the
    corresponding situation would be: the weight matrix is filled with
    reasonable looking values like .5, 2, -6, etc;  the weight matrix isn't
    changing;  back-prop cannot find a suitable weight to change;  and the
    network is performing poorly with many errors.  If you graphed the
    distribution of weight values, a local minimum would be seen as a lump
    of values centered more-or-less about zero (i.e., nominal), the network
    would be getting many associations wrong, and the weights would not be
    changing.  BrainMaker makes such graphs, called histograms (see
    photograph), and neither our customers nor we have ever seen this
    happen.  It may be that other algorithms have been trapped by this "Loch
    Ness Monster", but not BrainMaker.

    QUESTION: Why do you use back-propagation?

    We initially made the decision that BrainMaker would be
    application-oriented.  During early development we investigated many
    network paradigms: Hopfield nets, BAM's, counter-propagation,
    back-propagation, Kohonen nets, adaptive resonance (ART), restricted
    coulomb energy (RCE), neo-cognitrons, etc.(1) We found that
    back-propagation held the most promise for making usable applications.
    The other network types have many interesting research properties, and
    in many cases do a much better job of modeling biological neural
    networks than back-prop does.  However, they currently have severe
    real-world application limitations.

    Feedback networks such as Hopfield, BAM, and RABAM have a very limited
    capacity (less than linear) of n/4ln(n), where n is the number of
    neurons.  Also, the storage required and computation time for the
    network goes up as n (2).  To place this into perspective, if we want
    to store 1000 records in a feedback network without errors, we need
    10,000 neurons, and 100 million connections (400 megabytes in floating
    point).  To access an individual record, we need to perform 100 million
    multiplies and 100 million additions, times as many cycles as are needed
    to converge the network.  This is why Ashton-Tate is not terrified of
    this technology.  Additionally, we found in our investigations that as
    the number of neurons increases, the required precision for the weights
    increases.  The above network, with 10,000 neurons, would probably
    require quad precision (128 bit floating point) to converge.  Any
    Hopfield network can be replaced by a linear look-up table (a
    dictionary, for example) which will hold more information and retrieve
    it faster.

    Kohonen networks, RCE, and ART networks have a linear storage problem -
    each of these networks has a layer with nearest-neighbor neuron
    activations.  These neurons compare their stored weights to the input
    pattern and the neuron with the closest match is the winner;  only the
    winner fires.  Thus, these networks can store at most 1 pattern per
    neuron.  Our 1000 record data base above would now require 1000 neurons
    - an improvement of a factor of 10.  This is a big improvement, but not
    big enough.  Since counter-propagation uses a Kohonen stage in the
    forward path they share this storage problem.  The nearest-neighbor
    networks have various complicated learning rules, but the rules can be
    summarized: if you don't already have the pattern stored, store it.  We
    emphasize here that though these networks are non-linear and use
    non-linear neurons, they still have linear storage in the best case.
    Thus, the Minsky-Papert arguments against Perceptrons (3) hold against
    these architectures also.

    Back-propagation has not been mathematically analyzed for storage
    capacity.  However, there are many examples of back-prop networks which
    do much better than 1 pattern per neuron.  Sejnowski's original NetTalk
    held 1,000 words (about 7,000 letter-phoneme combinations), and showed
    no signs of being at full capacity with 80 hidden neurons and 56 output
    neurons (4).  If this network were a feedback network such as a BAM, it
    would require nearly 100,000 neurons and 10 billion connections.  If it
    were a linear network, it would require about 7,000 neurons with about
    1,500,000 connections.  Back-prop did this job with 136 neurons and
    about 20,000 connections.  In terms of storage and compute performance,
    back-prop is the clear winner.

    We concluded that these other network types are of great interest for
    research purposes, but are not very useful for applications.  To include
    them meant complicating our software, our user interface, and our
    manuals without any real benefit for customers wanting to get a job
    done.

    QUESTION: Is the Hypersonic Trainer another form of back-prop?

    The Hypersonic Trainer is a new neural network training algorithm,
    offered only by California Scientific Software.  Most other algorithms
    (including back-propagation) reiterate through the training set using
    gradient descent, a rough successive approximation of the weights.
    Hypersonic training instead uses the time-honored approach of least
    squared error approximation. It can find a neural network solution in a
    fraction of the time that back-prop takes.

    Neural networks in general can be thought of as semilinear matrix
    mathematics problems.  Hypersonic Training takes over after an
    initial guess for the weight matrices is made with a brief period of
    back-prop training (until about 10% of the facts are learned).  Then the
    Hypersonic Trainer uses pseudoinversions and matrix multiplies to
    produce the desired weight matrices.  The Hypersonic Trainer does not
    work in all cases and seems to be best applied to networks where the
    output values are more digital in nature.

    QUESTION: What problems can BrainMaker solve?

    The kinds of problems best solved by neural networks are those that
    people do well such as association, evaluation and pattern recognition.
    Problems that are difficult to compute and do not require perfect
    answers, just quick very good answers, are also handled well by neural
    networks.  This is especially true in real-time robotics or industrial
    controller applications.  Predictions of behavior and analysis of large
    amounts of data are also good applications such as stock market
    forecasting and consumer loan analysis.  Here's what some of our
    customers' applications do:

    o make money on stocks, bonds, commodities, and futures
    o determine the presence of an object obstructing the path of a vehicle
    o make money on football, horses, and greyhounds
    o diagnose babies as vulnerable to sudden infant death syndrome
    o predict the probability of rain tomorrow
    o find flaws in structural concrete
    o optimize yields from computer chip fabrication lines
    o simulate psychological behavior
    o predict the outcome of enzyme experiments
    o analyze the quality of beer using volatiles profiles
    o price and specify paint formulas based on customer needs
    o classify spectrum analysis of unknown materials
    o evaluate financial credibility of individuals
    o evaluate medical imaging
    o correlate plastics processing conditions
    o recognize fuzzy, noisy patterns for pacemakers
    o troubleshoot automatic assembly equipment for building cars
    o recognize submarines from sonar data

    Neural networks can be thought of as statistical analysis tools.
    BrainMaker is a fully automated non-linear multi-dimensional regression
    analysis tool.  We considered naming it "auto-multi-reg", but decided
    BrainMaker was a more interesting name.  Many people use linear
    regression for analysis, which is basically a plot of data points with a
    line drawn through them.  BrainMaker offers more - it thinks in terms of
    curves (it's non-linear), works on more than two variables (it's
    multi-dimensional), and fits the curves to the problem without human aid
    (it's automatic).

    Neural networks are frequently used to solve an interesting class of
    very difficult problems called N-P, which stands for Non-Polynomial.
    The traveling salesman N-P problem is to find the shortest route to
    drive to visit a group of cities, visiting each city exactly once.  This
    problem grows exponentially - ten times as many cities takes ten billion
    times as long to solve.  Back-propagation will generally find a solution
    in n^3 time.  This is not a mathematical breakthrough or a numerical
    analysis revolution.  The reason back-prop is so fast is that it doesn't
    always work;  infrequently it won't a solution.  Fortunately, it works
    very well almost all of the time.

    QUESTION: What kind of graphics does BrainMaker provide?

    After careful consideration, we decided that an approach such as drawing
    little circles and lines on a computer screen to represent neurons and
    synapses is not all that tremendously useful.  We built BrainMaker with
    the speed necessary to build large networks;  large enough that pictures
    of the connections are far to complicated to be meaningful.  We decided
    that displaying the inputs and the outputs of the network with
    informative graphs, symbols, pictures, numbers, and thermometers is a
    lot more meaningful.

    BrainMaker provides the ability to graph the effect of an input neuron
    on the outputs (see photograph).  In simple terms, these graphs provide
    a visual interpretation of the idiom "all else being equal...".  All but
    one input neuron is held at a pattern-defined or user-defined level.
    One input neuron is varied and the resultant effects upon the outputs
    are observed.  For example, a network trained to predict the price of GM
    stock could make a graph of how the dollar/yen ratio effects the stock.
    By making several graphs, using a different input neuron each time, we
    can quickly build up a lot of information about the input/output
    behavior of the system.  Some inputs may have little or no effect on the
    outputs, whereas others may be quite significant.  Still others may turn
    out to have efficacies which are highly dependent on the values to which
    you elect to clamp the other inputs.

    Similarly, NetMaker Professional allows you to graph any file of rows
    and columns of numbers.  The NetMaker Professional Graphics allow you to
    examine your data prior to making a network file, to make phase-plane
    plots to determine the existence of chaotic behavior, and even graph
    BrainMaker Professional statistics files to analyze the progress of your
    training.  In addition, you can inspect your data for cyclic properties
    using the included Fast Fourier Transform, make phase plane plots of
    your data, and spot overall trends in a column.

    QUESTION: Do I need BrainMaker Professional to do my job?

    Our $195 BrainMaker system has been used by thousands of people to solve
    problems and make money.  BrainMaker is a real system, not an
    exploration program, which can and will solve real problems.  We chose
    the limits from BrainMaker carefully - 512 neurons per layer is a good
    match for the compute, memory, and disk resources of PC's.  However,
    some people need larger networks, and are willing to pay the price in
    terms of required EMS memory and training time.  BrainMaker Professional
    includes all of the neural network programs and tools we've ever
    written, all integrated into one system.  Most people can solve their
    problem with BrainMaker, but it will be easier to build and use your
    neural network if you use Professional.

    QUESTION: How smart are neural networks?

    Looking at the neural network "intelligence" graph, you see plotted the
    log of the number of neural connections versus the log of the compute
    speed in connections per second.  We have plotted a worm in the lower
    left-hand corner and a person in the upper right hand corner.  Floating
    point networks have about the performance of a worm.  BrainMaker makes
    it about halfway to cochroach.  The latest chip technology is about the
    compute of a cochroach, but unfortunately only has the capacity of a
    worm.  If we project that neural network performance will double every
    three years as memory and micro-processors have, we're about 120 years
    away from silicon-person capacity, and about 130 years away from
    building the replacement for people.  Since cats have such an enviable
    life hanging out with people, maybe we'll be content to hang out with
    our silicon successors.  More seriously, there are quantum-mechanical
    arguments which make it seem highly unlikely that we'll ever reach such
    a performance level using silicon, gallium, or other room-temperature
    semiconductors.  Apparently there's a good reason why God did not make
    silicon critters.  Perhaps in twenty or thirty years we'll be making our
    computers using breeding programs instead of assembly lines.  It has
    been pointed out that people are the most highly programmable and
    adaptable devices known, and they can be mass-produced by unskilled
    labor.

    QUESTION: Where are neural networks headed in the future?

    Of course, we can only give our best guess for what the future holds,
    but here it is anyway.  Back-prop networks solve many interesting
    classes of problems, and software systems like BrainMaker have adequate
    performance for many people.  However, there are many abilities that
    people, or even dogs, have that we cannot currently duplicate.  Some
    people think that these intelligent properties spontaneously emerge when
    you build much larger networks - millions or billions of neurons.
    Perhaps this is so, but we doubt this.  The human brain consists of
    layers.  The bottom layer is a very old "reptillian brain" which
    monitors your basic body functions.  The next layer is an evolutionarily
    newer limbic system which provides for more complicated behaviors like
    emotions, fear, anger.  The top-most layer is our cortex, thought to be
    the mechanism for rational and logical thought.  Each layer is
    constructed of different types of neurons connected in different
    architectures.  So, to make a person you need many types of neurons,
    connected in many different fashions.  We think this means that we
    probably need different types of neural systems, depending on which
    human property we want to simulate.  We also note that many of the most
    impressive abilities which mammals have, such as locomotion, sight,
    hearing, tracking, and identification, seem to be handled by the older
    and simpler parts of the brain.  Even without singing, horses can do
    some pretty impressive things.

    Back-prop networks are enough for financial forecasting and simple
    control systems, but not enough for self-organized pattern recognition.
    Some other known architectures can self-organize, but have performance
    and capacity limitations which keep them in the laboratory as research
    tools.  We think that the design of new neural systems with new learning
    algorithms and new architectures will be the key to modeling more
    complicated intelligent behaviors.

    To discover these new neural systems, we will probably have to
    boot-strap ourselves by building software as best we can, then hardware
    which gives us the performance to train and experiment with larger
    networks.  It is likely that they will exhibit interesting properties
    which are not apparent in small networks.  Therefore, we believe that
    future breakthroughs in neural systems will be built as a result of
    software firms, hardware companies, and researchers working in concert
    to design new systems.  For this reason, we have formed a relationship
    with the Intel neural devices group, and are actively interested in
    supplying systems with the highest possible performance.  Probably in
    five to fifteen years, there will be neural systems available which will
    be so inexpensive and useful that they will be as commonly used as PC's
    are today.

    Sometimes we hear conjectures that neural systems will evolve to the
    point that they will replace conventional computers.  We think this is
    obviously incorrect.  People have pretty advanced neural networks
    built-in, and we still have a clear use for a PC running Lotus or dBase.
    Neural systems have many extremely interesting properties which will
    make them a very important part of technology, but they will never be
    all things to all people.


    footnotes:

    (1) These networks are described in detail in a number of places, for
    example:

         Neurocomputing, Foundations of Research, edited by James A.
         Anderson and Edward Rosenfeld, The MIT Press, Cambridge, Mass.,
         1988

         Artifical Neural Networks: Theoretical Concepts, edited by V.
         Vemuri, IEEE Computer Society of the IEEE, Washington, DC, 1988.

    (2) The Capacity of the Hopfield Associative Memory, Robert McEliece,
    IEEE Transactions on Information Theory V IT-33 #4 7/87

    (3) Perceptrons, Marvin Minsky and Seymore Papert, MIT press.  This book
    nearly succeeded in killing off neural network research.

    (4) NetTalk: A Parallel Network that Learns to Read Aloud, Terrence
    Sejnowski and Charles Rosenberg, Johns Hopkins University Electrical
    Engineering and Computer Science Technical Report JHU/EECS-86/01
                                                                                                       