.jp!ews19.krhm.jvc-victor.co.jp!ycpews01!yns02.yh.jvc-victor.co.jp!yns01.jvc-victor.co.jp!ygw02.jvc-victor.co.jp!newssvt02.tk!newssvt04.tk!newssvt05.tk!news-stock.gsl.net!news-dc.gsl.net!news-peer.gsl.net!news.gsl.net!gip.net!news-peer.sprintlink.net!new
s-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!199.72.1.10!newsfeed!interpath!news.interpath.net!news.interpath.net!sas!newshost.unx.sas.com!hotellng.unx.sas.com!saswss
Subject: changes to "comp.ai.neural-nets FAQ" -- monthly posting
Date: Fri, 29 Aug 1997 03:00:38 GMT

==> nn1.changes.body <==
*** nn1.oldbody.Mon Jul 28 23:00:15 1997
--- nn1.body.Thu Aug 28 23:00:09 1997
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part1
! Last-modified: 1997-07-28
  URL: ftp://ftp.sas.com/pub/neural/FAQ.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part1
! Last-modified: 1997-08-25
  URL: ftp://ftp.sas.com/pub/neural/FAQ.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 134,137 ****
--- 134,140 ----
  
     Neural Network hardware?
+    How to learn an inverse of a function?
+    How to get invariant recognition of images under translation, rotation,
+    etc.?
     Unanswered FAQs
  
***************
*** 460,464 ****
  
   o The Pacific Northwest National Laboratory web pages at 
!    http://www.emsl.pnl.gov:2080/docs/cie/neural/neural.research.html and 
     http://www.emsl.pnl.gov:2080/docs/cie/neural/products/ 
   o The Stimulation Initiative for European Neural Applications web page at 
--- 463,468 ----
  
   o The Pacific Northwest National Laboratory web pages at 
!    http://www.emsl.pnl.gov:2080/docs/cie/neural/ including a list of
!    commercial applications at 
     http://www.emsl.pnl.gov:2080/docs/cie/neural/products/ 
   o The Stimulation Initiative for European Neural Applications web page at 
***************
*** 784,790 ****
     says that the final neighborhoods "may" contain only the single cluster.
     But in the latter case, as Kohonen points out, the SOM is basically a
!    very fancy initialization algorithm for batch k-means, and I would be
!    concerned that you could lose the topological mapping properties of the
!    SOM. 
  
     In a SOM, as in VQ, it is necessary to reduce the learning rate during
--- 788,793 ----
     says that the final neighborhoods "may" contain only the single cluster.
     But in the latter case, as Kohonen points out, the SOM is basically a
!    very fancy initialization algorithm for batch k-means, and you could lose
!    the topological mapping properties of the SOM (Kohonen, 1995, p. 111). 
  
     In a SOM, as in VQ, it is necessary to reduce the learning rate during
***************
*** 816,819 ****
--- 819,830 ----
     relatively insensitive to exact form of h_ij. 
  
+ Kohonen (1995, p. VII) says that SOMs are not intended for pattern
+ recognition but for clustering, visualization, and abstraction. Kohonen has
+ used a "supervised SOM" (1995, pp. 160-161) that is similar to
+ counterpropagation (Hecht-Nielsen 1990), but he seems to prefer LVQ (see
+ below) for supervised classification. Many people continue to use SOMs for
+ classification tasks, sometimes with surprisingly (I am tempted to say
+ "inexplicably") good results (Cho, 1997). 
+ 
  o LVQ: Learning Vector Quantization--competitive networks for supervised
  classification (Kohonen, 1988, 1995; Ripley, 1996). Each codebook vector is
***************
*** 875,878 ****
--- 886,893 ----
     Psychometrika, 59, 509-525. 
  
+    Cho, S.-B. (1997), "Self-organizing map with dynamical node-splitting:
+    Application to handwritten digit recognition," Neural Computation, 9,
+    1345-1355. 
+ 
     Desieno, D. (1988), "Adding a conscience to competitive learning," Proc.
     Int. Conf. on Neural Networks, I, 117-124, IEEE Press. 
***************
*** 1252,1258 ****
     Oxford University Press. 
  
-    Chatfield, C. (1993), "Neural networks: Forecasting breakthrough or
-    passing fad", International Journal of Forecasting, 9, 1-3. 
- 
     Cheng, B. and Titterington, D.M. (1994), "Neural Networks: A Review from
     a Statistical Perspective", Statistical Science, 9, 2-54. 
--- 1267,1270 ----
***************
*** 1269,1272 ****
--- 1281,1288 ----
     Hand, D.J. (1981) Discrimination and Classification, NY: Wiley. 
  
+    Hill, T., Marquez, L., O'Connor, M., and Remus, W. (1994), "Artificial
+    neural network models for forecasting and decision making," International
+    J. of Forecasting, 10, 5-15. 
+ 
     Kuan, C.-M. and White, H. (1994), "Artificial Neural Networks: An
     Econometric Perspective", Econometric Reviews, 13, 1-91. 
***************
*** 1357,1360 ****
--- 1373,1379 ----
  http://divcom.otago.ac.nz:800/COM/INFOSCI/SMRL/people/andrew/publications/faq/hybrid/hybrid.htm
  also has links to information on neuro-genetic methods. 
+ 
+ For general information on GAs, try the links at 
+ http://www.shef.ac.uk/~gaipp/galinks.html 
  
  ------------------------------------------------------------------------

==> nn2.changes.body <==
*** nn2.oldbody.Mon Jul 28 23:00:21 1997
--- nn2.body.Thu Aug 28 23:00:15 1997
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part2
! Last-modified: 1997-07-26
  URL: ftp://ftp.sas.com/pub/neural/FAQ2.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part2
! Last-modified: 1997-08-07
  URL: ftp://ftp.sas.com/pub/neural/FAQ2.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 2080,2096 ****
  competitive learning with dimensionality reduction by smoothing the clusters
  with respect to an a priori grid (see "How many kinds of Kohonen networks
! exist?") for more explanation). But the original SOM algorithm does not
! optimize an "energy" function (Erwin et al., 1992; Kohonen 1995, pp. 126,
! 237) and so is not simply an information-compression method like most other
! unsupervised learning networks. Neither does Kohonen's SOM have a clear
! interpretation as a density estimation method. Convergence of Kohonen's SOM
! algorithm is allegedly demonstrated by Yin and Allinson (1995), but their
! "proof" assumes the neighborhood size becomes zero, in which case the
! algorithm reduces to VQ and no longer has topological ordering properties
! (Kohonen 1995, p. 111). Thus, there seems to be no definite answer to the
! question of what a Kohonen SOM learms. However, there are other approaches
! to SOMs that have more theoretical justification using mixture models with
  Bayesian priors or constraints (Utsugi, 1996, 1997; Bishop, Svens\'en, and
! Williams, 1997) 
  
  References: 
--- 2080,2103 ----
  competitive learning with dimensionality reduction by smoothing the clusters
  with respect to an a priori grid (see "How many kinds of Kohonen networks
! exist?") for more explanation). But Kohonen's original SOM algorithm does
! not optimize an "energy" function (Erwin et al., 1992; Kohonen 1995, pp.
! 126, 237). The SOM algorithm involves a trade-off between the accuracy of
! the quantization and the smoothness of the topological mapping, but there is
! no explicit combination of these two properties into an energy function.
! Hence Kohonen's SOM is not simply an information-compression method like
! most other unsupervised learning networks. Neither does Kohonen's SOM have a
! clear interpretation as a density estimation method. Convergence of
! Kohonen's SOM algorithm is allegedly demonstrated by Yin and Allinson
! (1995), but their "proof" assumes the neighborhood size becomes zero, in
! which case the algorithm reduces to VQ and no longer has topological
! ordering properties (Kohonen 1995, p. 111). Thus, there is no definite
! answer to the question of what a Kohonen SOM learns. 
! 
! A variety of energy functions for SOMs have been proposed (e.g., Luttrell,
! 1994), some of which show a connection between SOMs and multidimensional
! scaling (Goodhill and Sejnowski 1997). There are also other approaches to
! SOMs that have clearer theoretical justification using mixture models with
  Bayesian priors or constraints (Utsugi, 1996, 1997; Bishop, Svens\'en, and
! Williams, 1997). 
  
  References: 
***************
*** 2126,2129 ****
--- 2133,2139 ----
     Compression, Boston: Kluwer Academic Publishers. 
  
+    Goodhill, G.J., and Sejnowski, T.J. (1997), "A unifying objective
+    function for topographic mappings," Neural Computation, 9, 1291-1303. 
+ 
     Hecht-Nielsen, R. (1990), Neurocomputing, Reading, MA: Addison-Wesley. 
  
***************
*** 2149,2152 ****
--- 2159,2165 ----
     Kosko, B.(1992), Neural Networks and Fuzzy Systems, Englewood Cliffs,
     N.J.: Prentice-Hall. 
+ 
+    Luttrell, S.P. (1994), "A Bayesian analysis of self-organizing maps,"
+    Neural Computation, 6, 767-794. 
  
     McLachlan, G.J. and Basford, K.E. (1988), Mixture Models, NY: Marcel

==> nn3.changes.body <==
*** nn3.oldbody.Mon Jul 28 23:00:25 1997
--- nn3.body.Thu Aug 28 23:00:19 1997
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part3
! Last-modified: 1997-07-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ3.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part3
! Last-modified: 1997-07-29
  URL: ftp://ftp.sas.com/pub/neural/FAQ3.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 199,206 ****
  error can never be less than the variance of the noise, no matter how much
  training data you have. But you can estimate the mean of the target
! values, conditional on a given set of inputs, to any desired degree of
  accuracy by obtaining a sufficiently large and representative training set,
  assuming that the function you are trying to learn is one that can indeed be
! learned by the type of net you are using. 
  
  Noise in the target values is exacerbated by overfitting (Moody 1992). 
--- 199,207 ----
  error can never be less than the variance of the noise, no matter how much
  training data you have. But you can estimate the mean of the target
! values, conditional on a given set of input values, to any desired degree of
  accuracy by obtaining a sufficiently large and representative training set,
  assuming that the function you are trying to learn is one that can indeed be
! learned by the type of net you are using, and assuming that the complexity
! of the network is regulated appropriately (White 1990). 
  
  Noise in the target values is exacerbated by overfitting (Moody 1992). 
***************
*** 236,239 ****
--- 237,247 ----
     Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge:
     Cambridge University Press. 
+ 
+    White, H. (1990), "Connectionist Nonparametric Regression: Multilayer
+    Feedforward Networks Can Learn Arbitrary Mappings," Neural Networks, 3,
+    535-550. Reprinted in White (1992b). 
+ 
+    White, H. (1992), Artificial Neural Networks: Approximation and Learning
+    Theory, Blackwell. 
  
  ------------------------------------------------------------------------

==> nn4.changes.body <==
*** nn4.oldbody.Mon Jul 28 23:00:29 1997
--- nn4.body.Thu Aug 28 23:00:23 1997
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part4
! Last-modified: 1997-07-28
  URL: ftp://ftp.sas.com/pub/neural/FAQ4.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part4
! Last-modified: 1997-08-12
  URL: ftp://ftp.sas.com/pub/neural/FAQ4.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 1720,1728 ****
  
     System requirements are a 5.25" CD-ROM drive with software to read
!    ISO-9660 format. For any further information, including how to order the
!    database, please contact: Jonathan J. Hull, Associate Director, CEDAR,
!    226 Bell Hall State University of New York at Buffalo, Buffalo, NY 14260;
!    hull@cs.buffalo.edu (email) 
  
  6. AI-CD-ROM (see question "Other sources of information")
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
--- 1720,1730 ----
  
     System requirements are a 5.25" CD-ROM drive with software to read
!    ISO-9660 format. For further information, see 
!    http://www.cedar.buffalo.edu/Databases/CDROM1/ or send email to Ajay
!    Shekhawat at <ajay@cedar.Buffalo.EDU> 
  
+    There is also a CEDAR CDROM-2, a database of machine-printed Japanese
+    character images. 
+ 
  6. AI-CD-ROM (see question "Other sources of information")
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
***************
*** 1818,1821 ****
     ------------------------------------------------------------------------
  
!    Next part is part 5 (of 7). Previous part is part 3. 
  
--- 1820,1823 ----
     ------------------------------------------------------------------------
  
!    Next part is part 5 (of 7). Previous part is part 3. @
  

==> nn5.changes.body <==

==> nn6.changes.body <==
*** nn6.oldbody.Mon Jul 28 23:00:43 1997
--- nn6.body.Thu Aug 28 23:00:30 1997
***************
*** 456,461 ****
     expect to do a Mac port and maybe NT or OS/2
  
!    Copies of NuTank cost $50 each. Contact: Richard Keene; Keene Educational
!    Software; Dick.Keene@Central.Sun.COM
  
     NuTank shareware with the Save options disabled is available via
--- 456,463 ----
     expect to do a Mac port and maybe NT or OS/2
  
!    Copies of NuTank cost $50 each.
!    Contact: Richard Keene; Keene Educational Software
!    URL: http://www.xmission.com/~rkeene Email: rkeene@parkcity.com or
!    rkeene@xmission.com
  
     NuTank shareware with the Save options disabled is available via
***************
*** 1528,1531 ****
  ------------------------------------------------------------------------
  
! Next part is part 7 (of 7). Previous part is part 5. @
  
--- 1530,1533 ----
  ------------------------------------------------------------------------
  
! Next part is part 7 (of 7). Previous part is part 5. 
  

==> nn7.changes.body <==
*** nn7.oldbody.Mon Jul 28 23:00:46 1997
--- nn7.body.Thu Aug 28 23:00:34 1997
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part7
! Last-modified: 1997-07-18
  URL: ftp://ftp.sas.com/pub/neural/FAQ7.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part7
! Last-modified: 1997-08-11
  URL: ftp://ftp.sas.com/pub/neural/FAQ7.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 25,28 ****
--- 25,31 ----
  
     Neural Network hardware?
+    How to learn an inverse of a function?
+    How to get invariant recognition of images under translation, rotation,
+    etc.?
     Unanswered FAQs
  
***************
*** 362,365 ****
--- 365,519 ----
  ------------------------------------------------------------------------
  
+ Subject: How to learn an inverse of a function? 
+ ================================================
+ 
+ Ordinarily, NNs learn a function Y = f(X), where Y is a vector of
+ outputs, X is a vector of inputs, and f() is the function to be learned.
+ Sometimes, however, you may want to learn an inverse of a function f(),
+ that is, given Y, you want to be able to find an X such that Y = f(X).
+ In general, there may be many different Xs that satisfy the equation Y =
+ f(X). 
+ 
+ For example, in robotics (DeMers and Kreutz-Delgado, 1996, 1997), X might
+ describe the positions of the joints in a robot's arm, while Y would
+ describe the location of the robot's hand. There are simple formulas to
+ compute the location of the hand given the positions of the joints, called
+ the "forward kinematics" problem. But there is no simple formula for the
+ "inverse kinematics" problem to compute positions of the joints that yield a
+ given location for the hand. Furthermore, if the arm has several joints,
+ there will usually be many different positions of the joints that yield the
+ same location of the hand, so the forward kinematics function is many-to-one
+ and has no unique inverse. Picking any X such that Y = f(X) is OK if
+ the only aim is to position the hand at Y. However if the aim is to
+ generate a series of points to move the hand through an arc this may be
+ insufficient. In this case the series of Xs need to be in the same "branch"
+ of the function space. Care must be taken to avoid solutions that yield
+ inefficient or impossible movements of the arm. 
+ 
+ As another example, consider an industrial process in which X represents
+ settings of control variables imposed by an operator, and Y represents
+ measurements of the product of the industrial process. The function Y =
+ f(X) can be learned by a NN using conventional training methods. But the
+ goal of the analysis may be to find control settings X that yield a product
+ with specified measurements Y, in which case an inverse of f(X) is
+ required. In industrial applications, financial considerations are
+ important, so not just any setting X that yields the desired result Y may
+ be acceptable. Perhaps a function can be specified that gives the cost of X
+ resulting from energy consumption, raw materials, etc., in which case you
+ would want to find the X that minimizes the cost function while satisfying
+ the equation Y = f(X). 
+ 
+ The obvious way to try to learn an inverse function is to generate a set of
+ training data from a given forward function, but designate Y as the input
+ and X as the output when training the network. Using a least-squares error
+ function, this approach will fail if f() is many-to-one. The problem is
+ that for an input Y, the net will not learn any single X such that Y =
+ f(X), but will instead learn the arithmetic mean of all the Xs in the
+ training set that satisfy the equation (Bishop, 1995, pp. 207-208). One
+ solution to this difficulty is to construct a network that learns a mixture
+ approximation to the conditional distribution of X given Y (Bishop, 1995,
+ pp. 212-221). However, the mixture method will not work well in general for
+ an X vector that is more than one-dimensional, such as Y = X_1^2 +
+ X_2^2, since the number of mixture components required may increase
+ exponentially with the dimensionality of X. And you are still left with the
+ problem of extracting a single output vector from the mixture distribution,
+ which is nontrivial if the mixture components overlap considerably. Another
+ solution is to use a highly robust error function, such as a redescending
+ M-estimator, that learns a single mode of the conditional distribution
+ instead of learning the mean (Huber, 1981; Rohwer and van der Rest 1996).
+ Additional regularization terms or constraints may be required to persuade
+ the network to choose appropriately among several modes, and there may be
+ severe problems with local optima. 
+ 
+ Another approach is to train a network to learn the forward mapping f()
+ and then numerically invert the function. Finding X such that Y = f(X)
+ is simply a matter of solving a nonlinear system of equations, for which
+ many algorithms can be found in the numerical analysis literature (Dennis
+ and Schnabel 1983). One way to solve nonlinear equations is turn the problem
+ into an optimization problem by minimizing sum(Y_i-f(X_i))^2. This
+ method fits in nicely with the usual gradient-descent methods for training
+ NNs (Kindermann and Linden 1990). Since the nonlinear equations will
+ generally have multiple solutions, there may be severe problems with local
+ optima, especially if some solutions are considered more desirable than
+ others. You can deal with multiple solutions by inventing some objective
+ function that measures the goodness of different solutions, and optimizing
+ this objective function under the nonlinear constraint Y = f(X) using
+ any of numerous algorithms for nonlinear programming (NLP; see Bertsekas,
+ 1995, and other references under "What are conjugate gradients,
+ Levenberg-Marquardt, etc.?") The power and flexibility of the nonlinear
+ programming approach are offset by possibly high computational demands. 
+ 
+ If the forward mapping f() is obtained by training a network, there will
+ generally be some error in the network's outputs. The magnitude of this
+ error can be difficult to estimate. The process of inverting a network can
+ propagate this error, so the results should be checked carefully for
+ validity and numerical stability. Some training methods can produce not just
+ a point output but also a prediction interval (Bishop, 1995; White, 1992).
+ You can take advantage of prediction intervals when inverting a network by
+ using NLP methods. For example, you could try to find an X that minimizes
+ the width of the prediction interval under the constraint that the equation 
+ Y = f(X) is satisfied. Or instead of requiring Y = f(X) be satisfied
+ exactly, you could try to find an X such that the prediction interval is
+ contained within some specified interval while minimizing some cost
+ function. 
+ 
+ For more mathematics concerning the inverse-function problem, as well as
+ some interesting methods involving self-organizing maps, see DeMers and
+ Kreutz-Delgado (1996, 1997). For NNs that are relatively easy to invert, see
+ the Adaptive Logic Networks described in the software sections of the FAQ. 
+ 
+ References: 
+ 
+    Bertsekas, D. P. (1995), Nonlinear Programming, Belmont, MA: Athena
+    Scientific. 
+ 
+    Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford:
+    Oxford University Press. 
+ 
+    DeMers, D., and Kreutz-Delgado, K. (1996), "Canonical Parameterization of
+    Excess motor degrees of freedom with self organizing maps", IEEE Trans
+    Neural Networks, 7, 43-55. 
+ 
+    DeMers, D., and Kreutz-Delgado, K. (1997), "Inverse kinematics of
+    dextrous manipulators," in Omidvar, O., and van der Smagt, P., (eds.) 
+    Neural Systems for Robotics, San Diego: Academic Press, pp. 75-116. 
+ 
+    Dennis, J.E. and Schnabel, R.B. (1983) Numerical Methods for
+    Unconstrained Optimization and Nonlinear Equations, Prentice-Hall 
+ 
+    Huber, P.J. (1981), Robust Statistics, NY: Wiley. 
+ 
+    Kindermann, J., and Linden, A. (1990), "Inversion of Neural Networks by
+    Gradient Descent," Parallel Computing, 14, 277-286,
+    ftp://icsi.Berkeley.EDU/pub/ai/linden/KindermannLinden.IEEE92.ps.Z 
+ 
+    Rohwer, R., and van der Rest, J.C. (1996), "Minimum description length,
+    regularization, and multimodal data," Neural Computation, 8, 595-609. 
+ 
+    White, H. (1992), "Nonparametric Estimation of Conditional Quantiles
+    Using Neural Networks," in Page, C. and Le Page, R. (eds.), Proceedings
+    of the 23rd Sympsium on the Interface: Computing Science and Statistics,
+    Alexandria, VA: American Statistical Association, pp. 190-199. 
+ 
+ ------------------------------------------------------------------------
+ 
+ Subject: How to get invariant recognition of images under
+ =========================================================
+ translation, rotation, etc.?
+ ============================
+ 
+ See: 
+ 
+    Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford:
+    Oxford University Press, section 8.7. 
+ 
+    Masters, T. (1994), Signal and Image Processing with Neural Networks: A
+    C++ Sourcebook, NY: Wiley. 
+ 
+    Soucek, B., and The IRIS Group (1992), Fast Learning and Invariant Object
+    Recognition, NY: Wiley. 
+ 
+ ------------------------------------------------------------------------
+ 
  Subject: Unanswered FAQs
  ========================
***************
*** 372,376 ****
   o What error functions can be used? 
   o What are some good constructive training algorithms? 
-  o How can on-line/incremental training be done effectively? 
   o How can I invert a network? 
   o How can I select important input variables? 
--- 526,529 ----
***************
*** 480,483 ****
--- 633,637 ----
   o Paolo Ienne <Paolo.Ienne@di.epfl.ch> 
   o Paul Keller <pe_keller@ccmail.pnl.gov> 
+  o Peter Hamer <P.G.Hamer@nortel.co.uk> 
   o Pierre v.d. Laar <pierre@mbfys.kun.nl> 
   o Michael Plonski <plonski@aero.org> 
-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
* Do not send me unsolicited commercial, political, or religious email *
