

"Library Automation and the National Research Network," by Clifford A.
Lynch, Director of the Division of Library Automation, University of
California Office of the President.

First published in "EDUCOM Review," Volume 24, Number 3, Fall 1989, pp:
21-26. Editor: Sheldon B. Smith, EDUCOM, SMITH@EDUCOM

***************************************************************************
As a result of progress in library automation in the last decade, great
changes have occurred in access to information within institutions.
Emphasis has shifted from automating library operations to providing
computer-based access to library collections. This transition has raised
user expectations, and as a result, libraries face challenges in the coming
decade that will be tremendously costly and technically difficult to meet.
        Libraries have traditionally cooperated in operational activities
such as cataloging through the national utilities like the Online Computer
Library Center (OCLC) and Research Libraries Information Network (RLIN),
and the formation of consortia for resource sharing through interlibrary
loan. Only now are libraries becoming involved in movement toward national
end-user resource sharing that is represented by the development of the
national research network. They are just beginning to explore the ways in
which national networks interact with interinstitutional resource sharing
to support public access to information resources.
        Drawing heavily on experience at the University of California, this
article describes the current situation of automation at academic
institutions.  It outlines some of the major issues that face library
planners today and emphasizes the use of technologies such as nationwide
computer networks to improve access to information.

Institutional Library Automation
        From the late 1970s through the 1980s, libraries applied computer
technology to create great changes in public access to their collections.
While automation had been used much earlier to streamline the operational
aspects of libraries (acquisitions, cataloging, and circulation of
material, for example), by the beginning of the 1980s most libraries had
created a critical mass of machine-readable data describing the contents of
their collections that could be made available to the public.
        At about the same time, the cost of the computing cycles and
storage media necessary to support public-access information systems came
within the reach of most library budgets.
        Today, public-access online catalogs (of widely varying capability
and quality) exist at most major research libraries. These systems provide
access primarily to book collections; although access to journal titles is
not uncommon, most systems do not provide it.  Since World War II,
libraries generally have abdicated direct responsibility for providing
access to the journal literature to other organizations (collectively
called abstracting and indexing ser-vices). While limited in coverage,
online catalogs permit collection searching that is vastly superior to the
paper-based card catalogs they replaced. Collections that are
geographically scattered can be searched in entirely new ways through
online catalogs. Ten years ago, to identify all the books by a given author
held in the roughly 100 libraries of the nine-campus University of
California system, a researcher would have had to travel the state,
consulting multiple card catalogs on most of the UC campuses. Locating all
of the books published in the seventeenth century in the Portuguese
language even at a single library would have been impossible, since the
traditional card catalog was not designed for such searches. Today, using
UCs MELVYL union catalog, a researcher can obtain a summary of all 127
seventeenth-century publications in Portuguese held at any of the UC
campuses with a single command.
        Online catalogs are now major computer systems in their own right.
The MELVYL system contains records for about 10 million holdings (about 5
million different titles; many works are held at more than one library in
the UC system). The MELVYL system processed about 1.25 million queries in
May 1989, displaying over 11 million citations to its users. One can now
search the entire UC library system from ones home or office at any time of
the day or night. The online catalog has raised user expectations for
library service; for example, there is growing pressure for delivery
services linked to online catalogs that would allow users to obtain the
material they have located, either electronically or through campus mail,
and for access to the journal literature on an equal basis with the
monographic literature. In many research libraries, about half of the
acquisitions budget is spent on journals every year, and for students,
practitioners, and scholars in many fields, the journal literature is
perhaps a more vital resource than the monographic literature.
        In the past two years, a number of major universities have begun to
explore the provision of access to selected journal literature as part of
the online catalog by licensing abstracting and indexing databases produced
by commercial firms or by the government. The results of these experiments
have staggered library planners by revealing the magnitude of the unmet
demand for information. Prior to the mounting of these databases in library
catalogs (where they are licensed on an institutional basis and typically
made available to the user community at no direct cost), researchers could
obtain access to the journal literature through commercial services such as
DIALOG or BRS, but this access was (and is) very expensive (charges of $100
an hour for searching are quite common) and difficult to achieve since
these services were designed for use by trained searchers. There was no
funding mechanism at most universities to support such access, except for
an often halfhearted program to provide a few mediated searches each year
for faculty members willing to go to the library and ask a librarian to
conduct a search.
        With support from the National Library of Medicine, the University
of California has made the last three years of the MEDLINE database (about
750,000 citations in the biomedical and health sciences) available to its
user community as part of the MELVYL catalog. In May 1989, this system
processed about 175,000 queries and displayed over 2 million records. These
statistics highlight several important changes. We have moved into a mass
market for information. A typical MEDLINE search on a system like DIALOG
might cost $10 or more, but the license fee for the MEDLINE database
amounts to only a few cents per search (plus the cost of the computing
resources necessary to support the database). The availability of easy
access to information resources like MEDLINE is changing the way these
databases are used to support research and instruction. They are becoming
integrated into the academic programs, although there are serious questions
arising about who should take primary responsibility for teaching people to
use the resources and how the costs of the databases should be financed.
        Extrapolating from the MEDLINE experience, which has provided
access to only a single, narrow part of the total journal literature, it is
clear that licensing, mounting, and providing support for access to the
full journal literature will be a massive task that is probably beyond the
capabilities and resources of any single institution. The difficulty here
is not simply in finding the money for license fees and for computing
hardware to support the databases as they are acquired (although both of
these are problems); huge human resources also are necessary to build
high-quality database implementations and to support them through training
and user services programs once they are built.
        The development of additional journal databases will probably
proceed slowly at UC.  We are presently working on the Current Contents
database from the Institute for Scientific Information, which should be
available around the end of 1989. At our present level of effort, we can
mount only two major databases per year, which means that it will be a long
time before we have coverage of any major part of the journal literature.
The process of database selection is complicated by the lack of information
about the relationships among available databases, user needs, and the
materials in the UC collections, although we are beginning to study these
questions.
        The dramatic expansion of public access that began  with online
catalogs and has continued with journal databases leads to a number of
additional developments that have major implications not only for improved
quality of access by library users but also for institutional operations
and costs.
        Delivery of Actual Information. Technical questions arise such as
the form in which information is stored and transmitted, and heavy demands
are made on both storage and transmission facilities. Information delivery
affects the core of the relationships between libraries and publishers and
raises troublesome issues in copyright law. Initiatives in this area also
force an examination of the current system of scholarly publication and its
reinvention in an electronic environment, and of the increasingly painful
costs that the current journal publishing system levy on library
acquisition budgets.
        Preservation, Conservation, and Storage. Libraries must deal with
deteriorating collections. One possible solution is to employ electronic
imaging technology to convert these collections into electronic form, which
both facilitates information delivery and protects the collection.
Libraries cannot afford the building programs required to provide miles of
shelf space annually for new paper acquisitions. Conversion of holdings
into electronic form, and acquisition of new materials in electronic form,
can help libraries cope with this problem too.
        Image Databases and Other Data Resources. Traditionally,
information management has dealt with the printed word. While libraries do
house nonprint collections, such as slides and photographs, these
collections have been considered specialized and difficult to use, and they
have been subjected to restricted access. It is now possible to convert
these collections to electronic form, and to deliver them over networks.
However, massive and expensive indexing efforts will be required to allow
users to search these collections effectively. For example, to convert
images concerning Renaissance Venice into electronic form, each image must
be described (who painted or drew it, where it was done, when it was done,
what artistic techniques were used) and indexed by content (what is in the
picture). The enormous cost of such indexing raises a serious question of
values: Which images are worth indexing to this depth?
        In addition, large new databases are emerging in areas such as
satellite imaging, social-science research, and the geographic sciences.
Most of these databases are inherently multidisciplinary in their appeal,
and most require computer processing as part of their use. These are
increasingly important information resources; the role of the library in
maintaining, managing, and providing access to them remains unclear.
        Institutions face the growing problem of selectively applying the
plethora of current technologies to an abundance of data. They cannot
afford to do everything, but their choices are difficult, because
initiatives in different areas apply unevenly to different scholarly
disciplines.


Networks and Interinstitutional Library Automation
        Since the late 1960s, libraries have worked cooperatively through
specialized networks centered around two major bibliographic utilitiesOCLC
in Ohio, and RLIN in Californiato reduce the costs of computerized
cataloging.  However, the connection of public access library information
systems to the networks used by researchers is still a very new
consideration. Only in the past year have many libraries offered access to
their online catalogs to their campus user communities outside of the
library. Evidence of the novelty of the concept can be found today in the
deplorably poor support of remote access by most commercially available
library automation systems, where the state of the art lags years behind
the capabilities of typical general-purpose computing systems.
        In 1989, UC began an experimental program offering access to remote
systems for the MELVYL user community. The first system offered was that of
the Colorado Alliance of Research Libraries (CARL); by the time this
article reaches print, about ten additional systems should be available.
Access to these systems is via remote log on using the TELNET protocol used
across the Internet. We expected that interest in remote systems would be
limited by users reluctance to  learn a new interface for each system. What
we did not expect, as we examined the various systems on the Internet that
were available, was the extreme difficulty of making remote login viable.
Many systems assumed specific terminal types and did cursor addressing
specific to that type without prompting for terminal type. Many systems
required the user (or the MELVYL catalog on behalf of the user) to navigate
rather complex dummy login sequences. A surprising number of systems did
not offer any way to log off. More than one system did not appear to work
properly in a TCP/IP (Transfer Control Protocol/ Internet Protocol) network
environment, leading to data loss and frozen terminals.  Based on UCs
experience, the library community on a national level needs to make a great
deal of progress before reciprocal remote log in will be practicable as a
means of obtaining access to resources.
        Even assuming that these technical problems can be worked out, it
is clear that remote login is not a long-term solution. Users cannot be
expected to learn a new interface for each information resource they want
to access. Furthermore, the ability to consolidate and manipulate results
collected from multiple information resources is essential. We must build
information servers (dedicated storage machines accessible through a
network) that will allow users to employ familiar local interfaces to
access remote resources. The technical basis for these information servers
will be the Z39.50 protocol for computer-to-computer information retrieval,
although this will have to be supplemented and extended to provide a
complete real-world solution. The notion of building information servers is
relatively new and has not been explored as part of the overall networking
research and development of the last two decades that has produced
electronic mail, file trans-fer, remote login, and, more recently, such
technologies as Xwindows, remote procedure calls, and network file
systems.
        We do not fully understand the potentials and limitations of trying
to separate a user interface from an information retrieval system. It is
important to realize that this problem is applications-oriented and is
qualitatively different from the types of problems that arise in
distributed database systems. Simply being able to execute queries in a
language such as SQL (Standard Query Language) will not be enough, since
this would demand that the user interface on any one system understand all
of the minutae of imple-mentation at each autonomous remote information
resource it wished to access on behalf of the user. Prototyping projects in
the use of Z39.50 to support public access are just beginning; these
include both workstation implementations of Z39.50 clients and
mainframe-based client and server implementations for major information
resources. A number of educational institutions and other organizations
nationwide are involved in various aspects of these efforts.
        Information servers will be important in many contexts other than
online catalogs and related resources. For example, database publishing in
media such as CD-ROM is problematic for institutions that have made major
commitments to networking technology and resource sharing. In this
environment, a CD-ROM database sits isolated, attached to a PC, with an
idiosyncratic user interface, without network access, and without any
facility to permit the integration of data extracted from the database on
the CD into the users overall computing environment. But if CD-ROM
databases are coupled with information server software, they can be
effectively integrated into the computing environment at an institution.
        Information servers will also be critically important because they
will provide a uniform method for computer programs to extract information
from databases. Today, these databases are used almost solely by human end
users; but we can imagine programs (such as the knowbots proposed by Kahn
and Cerf) that seek out, refine, and manipulate data from these databases
on a continuing basis, once created by human end users.
        Information servers will allow the proliferation of databases as
network resources. This growth will lead to further challenges, perhaps the
greatest of which will be locating and identifying relevant databases to
meet specific information needs. Directories of databases that go far
beyond the current efforts to compile simple lists of resources available
on the national research network will be required. As part of the
development of such directories, it will be necessary to define precisely
the generally available databases. At one end of the spectrum are major
institutionally sponsored databases, such as those of a library; at the
other extreme is a database an individual faculty member mounts on a
workstation and shares with some community members on an informal basis.
Between these two extremes are databases mounted by departments or research
projects that may be of vital interest to specific communities outside an
institution. Universities will need to develop database acquisition,
access, and support policies to ensure that best use is made of the growing
range of database resources.
        As information servers multiply and mature, we will see attempts to
automate even the selection of databases to search. One can envision a user
issuing a query to a search system that first scans local databases at the
users institution and then, if the user wants more information, employs a
rule base that considers search content, time of day, and
interinstitutional arrangements or billing schedules to select a series of
additional databasesand all the user knows is that he or she has told the
system to keep looking.
        Information networks will also provoke new and fascinating policy
questions. Today, many online catalogs, such as MELVYL and  CARL, are truly
public access: anyone can sign on across the Internet without an ID. There
was great concern that these systems would be swamped with outside users,
but extra-institutional use seems to be  about 1 or 2 percent of total load
at present and is not a problem. To put this in perspective, however, both
MELVYL and CARL are rather large systems, and a few percent represents many
thousands of searches per week. As smaller library systems come on the
Internet, there may be a need to protect them from being overloaded by
extra-institutional use. The key question is whether outside use of a
resource is likely to be disproportionate to use by the supporting
institution.
        To answer this question, one must look at the resources that are
being offered and the reasons why outside users would want access to them.
In the case of the MELVYL catalog, users have access to a list of books
held by UC.  This is primarily a catalog of what one can get from the UC
libraries and is useful largely in relation to how easily the person
searching on the MELVYL system can actually obtain the material. To a
lesser extent, simply because of the huge size of the UC collections, the
MELVYL catalog can serve as a bibliographya comprehensive list of material
that exists about a given subject. Smaller monographic catalogs, unless
they are particularly comprehensive in some specific area that the host
institution collects, would serve purely as catalogs and be of interest
chiefly to members of the host institution and perhaps a few other nearby
institutions with well-established resource sharing and inter-library loan
programs. At a national level, the same or equivalent material could be
obtained more easily closer to home. Thus, for catalogs, one can imagine a
situation evolving in which the great majority of users will be satisfied
with their local institutional catalog, perhaps a few other local
institutional catalogs, and possibly (if they are at a small institution)
one or two regional catalogs offered by large research institutions.
        With journal-article abstracting and database indexing, the
situation is very different. These databases are bibliographies, and thus
of equal interest to people everywhere.  To date, there is very little
experience with public access to databases covering the journal literature,
since they are licensed (thus, the MEDLINE license agreement, for example,
prohibits UC from offering MELVYL MEDLINE outside the UC community). It
seems likely that if journal-article abstracting and database indexing are
to become network resources, it will be in the context of
inter-institutional consortia, whose members jointly license the database
for use by the consortium, select one or more members to physically mount
the database, and then reimburse the institution hosting the database for
use by the other consortium members (either directly, or through barter in
situations where different consortium members mount different databases).
        There are difficult economic and policy problems to be addressed in
this environment, both among the participating institutions and between the
consortium and the database provider. For example, institutions may want
flat-rate licenses to databases, so that rational budgeting for database
use is possible and so that their users will not be deterred from
exploiting the database as fully as possible. On the other hand, some
institutions will not use some databases very heavily, and they will want
their costs to take account of actual or both expected level of use and
size of institution. Balance-of-trade questions will have to be negotiated
within consortia. Finally, it seems clear
     that the consortium model described here will make it difficult for an
individual scholar at an institution to have access to the complete
spectrum of information resources, since many resources will be available
only to closed communities. The possibility exists that the forthcoming
database-access environment will tend to force institutions to increasingly
specialize the information resources they provide to their user
communities. It also seems likely that there will be a continuing need for
commercial services to provide access to databases that do not fit within
the mass-use consortium model. Provision of such access (particularly to
databases that are of great interest to scholars and that do not enjoy a
nonacademic user base) may be an increasingly important role for the
bibliographic utilities such as OCLC and RLIN.
        As we look to actual delivery of information, the future is harder
to predict, largely because it is not clear what the economic and legal
models will be. The possibilities range from publisher-controlled servers
providing demand delivery to the educational community, through
university-controlled electronic publishing databases. It is difficult to
know how this will affect the ability of an individual scholar to obtain
electronic access to information. It does seem likely that, as image
collections are converted to electronic form and indexed, they might be
offered through the national network on a public-access basis, much as
online catalogs are being made available today.

Conclusions
        We are at the beginning of the development and deployment of a set
of networked information technologies that will evolve and mature in the
1990s. In the past two decades, computing and communications technologies
have made enormous strides, but we have been less successful in using these
technologies to distribute and provide access to information, particularly
in ways that exploit the intrinsically different characteristics of the new
environments. Online catalogs, for example, began as the automation of an
existing manual function, and we still do not fully understand how they
should differ from their paper-based predecessors. Information technologies
offer much room for innovation.
        In 1970, it would have been difficult to predict how the then
infant computer networking technologies would develop into the national
research network that has changed the world of higher education and
research. Networked information technologies will change the world again in
the 1990snot only in predictable ways, but also in ways that we cannot
foresee because these technologies are so intimately tied to the basic
organizational and social structures of instruction and research.

***************************************************************************
CCNEWS Copyright Notice

If you use this article, in whole or in part, in printed or electronic
form, you are legally and morally obligated to credit the author and the
original publication name, date, and page(s). We suggest that you also
inform the author of your intention to use this article, in case there are
updates or corrections that he or she might wish to suggest.

If space and format permit, we would appreciate your crediting the "Articles
database of CCNEWS, the Electronic Forum for Campus Computing Newsletter
Editors, a BITNET-based service of EDUCOM." We would also appreciate your
informing us (via e-mail to CCNEWS@EDUCOM) when you use an article, so we
will know which articles have proven most useful.

***************************************************************************