       
                                        Ref:    DXS/SPEC/L1
                                        Date:   06-Dec-91
                                        Author: Peter Ciuffetti
                                
                                
               CD-ROM Data Exchange Standard (DXS)
                      Overview and Glossary
                           Version 1.0
       
       
            Paul Sanders
            SilverPlatter Information Ltd.
            10 Barley Mow Passage
            Chiswick
            London W4 4PH
            England
       
            Tel: +44 (0) 81-995 8242
            Fax: +44 (0) 81-995 5159
       
            This document is available in Microsoft Word 5.5
            format from the above address.


































                                   1

                        Table of Contents
       

1.     Introduction                                        3
  1.1    Change History                                    3

2.     The Need for a Data Exchange Standard               3

3.     Client-Server Architecture                          4

4.     Interoperability                                    5

5.     Organization of the DXS Document Set                6

6.     DXS Scope                                           7
  6.1    Data Model                                        8
  6.2    Target Platforms                                  8
  6.3    Functional Model                                  9

7.     DXS Features                                       10
  7.1    Protocol Language                                10
  7.2    Installation Support                             10
  7.3    Server Location                                  10
  7.4    Database Selection                               10
  7.5    Database Information                             11
  7.6    Index Access                                     11
  7.7    Search Expressions                               11
  7.8    Data Retrieval                                   11
  7.9    Full-text Features                               11
  7.10   Extensibility                                    11
  7.11   Transfer Syntax                                  12
  7.12   Implementation Guidelines                        12























                                   2



1.     Introduction

       The Data Exchange Standard (DXS) document set (of which
       this document forms a part) defines a general purpose
       mechanism for standardizing information access for a wide
       variety of information sources and delivery platforms.
       The DXS document set identifies the architecture that
       allows information retrieval system developers to build
       systems which are user interface independent.  The
       beneficiaries in an environment where information access
       is standardized are ultimately researchers themselves, but
       the information industry and society as a whole will also
       benefit as a result.


1.1    Change History

       This is the first live version of this document.


2.     The Need for a Data Exchange Standard

       The requirement for interface independence has been
       prompted by the success of the CD-ROM industry and has
       been voiced by the consumers of those products emerging
       from that industry.  The success of the CD-ROM industry
       results from the physical standardization of the CD-ROM
       disc (by Philips and Sony) and international acceptance of
       the directory structure on the disc (ISO 9660).  These
       standards have simplified the creation of distributable
       information products and comprehensive catalogs of data
       sources now exist in every major discipline.
       
       Many organizations large and small have built a collection
       of these products and have become overwhelmed by the
       variety of computer systems which each of their
       constituents must learn in order to extract the imbedded
       knowledge on each disc.  The diversity of these systems
       has created a barrier to the continued growth and
       acceptance of distributable information products.  This
       variety, although emphasized by the success of the CD-ROM
       industry, encompasses all information products, whether
       distributable or not, since researchers desire access to
       each electronic resource at their disposal.
       
       All researchers, from casual to full-time, require that
       access to information be universally simplified,
       regardless of its source.  Were universal simplification
       readily achievable, that would be the subject of this
       document set.  Given that it is not, due to the diversity
       of researchers themselves, the architecture of DXS strives
       for a compromise which is readily achievable.  DXS



                                   3

       separates the window into the information from the
       information itself and leaves the selection of the window
       to the researcher in a way that achieves interoperability
       and interface independence.


3.     Client-Server Architecture

       The architecture used by DXS to achieve interoperability
       is called "client-server."  Many computer systems
       successfully use a client-server strategy to simplify or
       standardize their functionality.  To describe this
       approach, let's identify the three major elements in any
       simple information retrieval system.  They are; the
       database, the software program used to query that
       database, and the computer that the software program runs
       on.  In today's predominantly non-client-server products,
       the database and the software come from the same vendor
       and run only on a few types of computer.  Due to lack of
       standards, it is typically not possible for a researcher
       to select one of these three elements and replace it with
       another that he prefers.  In most commercial products,
       these elements are so tightly bound that the researcher
       has to accept them as a single package.
       
       To create some degree of freedom, the client-server
       architecture splits the software program in half.  One
       half encompasses all of the functions necessary for query
       formulation and display.  Typically this is called the
       "client" or "user interface."  The other half encompasses
       all of the functions necessary for query evaluation and
       data access.  Usually this is called the "server" or
       "retrieval engine."  As a result of the software split,
       there are now a total of four basic elements; the
       database, the client program, the server program and the
       computer.
       
       The glue that holds the client and server together is a
       messaging system, analogous to electronic mail, that the
       client and server use to pass queries and results back and
       forth.  It is this messaging system which is the target of
       DXS standardization.  By defining the syntax and semantics
       of the messages passed between the client and the server,
       interoperability is achieved between clients and servers
       supplied by different vendors; this in turn allows the
       researcher to select the user interface that he prefers.
       
       Other approaches besides client-server could be used to
       achieve interoperability.  These include standardizing the
       database file structures or standardizing user interfaces.
       These are perhaps more appropriate for niche communities
       which may be prepared to accept limitations in adopting
       future file structure and user interface technologies in
       favour of standards today.  Client-server architecture



                                   4

       serves as a more flexible solution which allows
       researchers to choose any available DXS-conformant user
       interface while preserving the developer's freedom to
       change technologies.


4.     Interoperability

       Under DXS, interoperability is defined as "the ability for
       any conforming DXS client to query any conforming DXS
       server with which it has the ability to communicate."
       
       More specifically, a researcher can select any DXS
       compliant client program, from any creator of such a
       program, and use it to identify some meaningful set of the
       information contained in any DXS compliant database and
       display that information in some form when those programs
       support the same operating system or network.  In
       addition, the researcher is able to switch from searching
       one conforming database from one vendor to another
       conforming database from a different vendor without having
       to change, or even leave, the client program that they
       were previously using.
       
       Note that underlying this definition is an acknowledgment
       that there is a wide variety of information types and
       data-specific functions used for their access and display.
       As a result of this variety and in the interest of
       practicality, some data elements unique to a given
       information product may not be retrievable or displayable
       in an optimized form by all clients.  To efficiently
       access and display these unique data elements may require
       using a specific client which understands this data.  What
       is guaranteed by DXS compliance is not universal access to
       all types of data so much as usable access to all types of
       data.  As long as the heart of an information product can
       be expressed using the DXS model, it will be practical to
       access that database with any DXS client.
       
       Therefore, when confronted with the need to access a
       database containing one or more unique data elements,
       researchers will have the opportunity to either access the
       bulk of the database with any DXS client or access the
       whole database with the database vendor's specific (DXS)
       client.  Which choice they make will depend on their
       desire to access the database in its full glory weighed
       against their reluctance to learn to use a new client (and
       hence a new user interface).









                                   5

5.     Organization of the DXS Document Set

       The Data Exchange Standard consists of a collection of
       four related documents.
       
            DXS Documents
       
            CD-ROM Data Exchange Standard
            Overview and Glossary
            DXS/SPEC/L1
            December, 1991
       
            CD-ROM Data Exchange Standard
            Database Server Access Protocol
            DXS/DSAP/L1
            December, 1991
       
            CD-ROM Data Exchange Standard
            Client-Server Transfer Syntax
            DXS/CSTS/L1
            December, 1991
       
            CD-ROM Data Exchange Standard
            Platform Dependent Implementation Details
            DXS/PLAT/L1
            December, 1991
       
       
       The first document, this one, introduces DXS concepts and
       describes the scope and functionality of DXS.  The target
       audience is any individual interested in information
       retrieval issues.
       
       The next three documents are targeted for retrieval system
       designers and implementers.  Familiarity with system
       design and programming issues will benefit readers of
       these documents.
       
       The Database Server Access Protocol is the heart of the
       DXS standard.  The DSAP specifies the set of messages that
       clients and servers can pass to each other.  The variety
       and functionality of these messages represent the richness
       of the DXS standard.  The messages included specify how
       clients discover information about servers and databases,
       how queries are expressed and how results are returned.
       Included in the DSAP are specifications which accommodate
       extensibility to the message set.
       
       The Client-Server Transfer Syntax is a general purpose
       specification which describes the format of the messages
       passed between the client and the server.  The
       specification includes features which will allow a variety
       of machine-to-machine communication protocols and program-
       to-program interprocess messaging strategies to be used as



                                   6

       the conduit through which requests and responses are
       passed.
       
       The Platform Specific Implementation Details documents how
       clients and servers are loaded, how they find each other,
       and how requests and responses are exchanged under various
       operating system environments.  This version of DXS
       identifies solutions for both networked and non-networked
       environments including MS-DOS, Windows, Apple Macintosh,
       POSIX, TCP/IP, Novell IPX and NetBIOS.
       


6.     DXS Scope

       The scope of any interface independent retrieval protocol
       helps define what type of information sources can be
       easily accommodated and what types of hardware platforms
       are possible as hosts for the resulting retrieval system.
       There is a dynamic between the richness of the protocol
       and the minimum hardware requirements that drives the
       hardware costs up as the number of included functions and
       data types grows.  To support a larger variety of
       information types the number of functions must be
       expanded.  A balance must be crafted that promises
       interoperability among the most popular variety of
       information sources on the most popular variety of
       platforms.
       
       Confounding this selection are three facts.  Firstly,
       commercial products will not become available for some
       months or years after the original decisions are made.
       Secondly, popularity will continue to evolve as products
       emerge.  Thirdly, the overheads of conforming to a
       standard and having two programs where there was once one
       suggests that non-standard products that exist today
       cannot duplicate their functionality and conform to a
       standard without raising the minimum hardware
       requirements.  Any strategy which targets todays platforms
       but does not achieve a critical mass of products within
       three years is doomed to short life.
       
       The DXS specification therefore carefully expresses its
       scope in the following paragraphs.  The targeted scope
       balances a feature set that will accommodate a large
       variety of database functions and will fit well on
       computers which support a multi-tasking operating system.
       This may appear as a high-end system today but will become
       de rigueur as products emerge.
       
       The scope is expressed as a categorization of the
       information sources which would be accommodated, the
       platforms which could host the standard and the functional
       strategy selected.



                                   7

6.1    Data Model



6.1.1  Read-Only vs. Read-Write Databases

       This version of DXS supports read-only information
       products.  The server is asked to query what the client
       considers a static information source.  Updates to the
       data are outside the scope of DXS and may happen in any
       way the information vendor chooses.  Information products
       which require client-originated updates may do so using
       vendor extensions.  Note that this does not mean that DXS
       is limited to CD-ROM.  The media type is not specified by
       DXS; any direct access device would be sufficient.


6.1.2  Textual vs. Non-textual

       All databases which are mostly textual or numeric in
       nature can be queried by DXS clients.  Vendor extensions
       may be employed to return graphic data or spatially
       oriented data.


6.1.3  Data Organization

       The organization of the database controlled by the server
       uses the model of a set of records of arbitrary size.
       Optionally, within each record is one or more fields of
       arbitrary size.  Within each field are one or more
       sentences.  Within each sentence are one or more words.
       Words consist of one or more characters using editing
       rules at the server's discretion.
       
       There is support for a hierarchical organization of
       records which can be accessed through a table of contents.
       
       There is no formal schema.  Servers assist clients by
       providing field information, index information and query
       evaluation strategies upon request by the client.


6.2    Target Platforms



6.2.1  Standalone vs Network Access

       Both standalone and networked computers are supported by
       DXS.  A distinction is made between a non-networked
       server, called a local server, and a networked server.
       This is mainly because, in the former case, the database
       may be removable and the researcher may change it when he



                                   8

       chooses, whereas in the latter case this is a supervisory
       function.  There are also other technical considerations
       which merit a distinction.
       
       A multi-tasking operating system is not required but
       asynchronous operation of the client and server is assumed
       by DXS.  This means that the client and server are, or
       appear to be, two independently functioning programs,
       whether they are running in the same machine or not.
       Therefore, implementation of a client and local server
       under a single-tasking operating system, such as MS-DOS,
       will be complicated by the need to emulate the
       asynchronous aspect of the protocol.  No such problem
       applies to network clients or clients in a multi-tasking
       operating system.  The asynchronous operation of the
       client and server is an important element of a flexible
       architecture.


6.3    Functional Model



6.3.1  Client-Server

       DXS compliant products can take one of two forms.  A
       client program can be created for a given operating system
       on a given computer.  Or, a server program, coupled with
       one or more databases can be supplied for a given
       operating system or network on a given computer.
       
       Commercial products may typically provide several of both
       clients and servers, a matching set for each supported
       operating environment.
       
       Server programs are only responsible for understanding the
       proprietary file structures, data elements and index
       strategies used in the databases published by the server
       vendor.  It is not required that a server program
       understand the file structures of databases from any other
       vendor.
       
       Client programs are responsible (amongst other things) for
       mapping end-user search requests into DXS-compliant
       queries.  The client transmits these queries to the server
       using the mechanism specified for the operating system or
       network on which it is running.  Since the transfer syntax
       lets servers choose the optimum format for returned data
       elements, clients must be prepared to accept any legal
       return type for the requested data.  Clients can
       optionally support either networked servers, local servers
       or both.





                                   9

6.3.2  Client-Controlled

       Since DXS benefits are intended to be end-user oriented,
       the client controls the session.  In order to query a DXS
       database, the end user first starts his client program.
       It is then the client's responsibility to locate the
       available servers.  For fully functioning clients which
       understand local and network servers, some servers will be
       available via the network, whilst  local servers will be
       available via a table of installed servers.  As the client
       discovers each available server, a list if database titles
       will be constructed resulting in a menu of information
       sources at the user's disposal.  The client must be able
       to establish connections to network servers and load and
       unload the appropriate local servers as the user switches
       from one database to another.


7.     DXS Features



7.1    Protocol Language

       Given that DXS is an agreement between two computer
       programs, DXS specifies the messages and their contents in
       a machine-friendly fashion.  To simplify the creation and
       evolution of clients and servers, the protocol language
       does not require a grammar.  To maintain the highest
       possible degree of portability and interoperability
       between different operating systems and networks, DXS does
       not use remote procedure calls or any other form of direct
       interprocess binding.


7.2    Installation Support

       DXS specifies a standard information file created during
       local server installation which simplifies the creation of
       menus and switching between local servers by clients.


7.3    Server Location

       Servers accessed over networks can be easily located and
       functions are included that allow the client to discover
       the attributes of each server.


7.4    Database Selection

       The implementation details specify how local servers are
       loaded and unloaded as the user moves from one database to
       another.  Included is support for removable media for



                                   10

       local information sources.  Login security is supported if
       required for a specific information source.


7.5    Database Information

       Clients can discover the field names, capabilities and
       limitations of each database they query.  Included among
       the available returned information are provisions for
       database specific end-user help text.  This allows a
       client to display a semantic description of a database and
       its fields supplied by the server.


7.6    Index Access

       Where supported by the server, indexes can be accessed
       conveniently and efficiently for display and browsing
       purposes.


7.7    Search Expressions

       A full  complement of boolean operators are specified
       complete with a variety of pattern matching expressions.
       Mechanisms exist to permit servers to support only a sub-
       set of these.  Search progress and search interruption are
       also supported.


7.8    Data Retrieval

       Returned data elements may have flexible markup codes
       imbedded by servers to allow enhanced display or printing
       by clients.  Servers may sort retrieved records or rank
       them by relevance when so requested by a client.  "Set
       hits" and "get hits" requests allow clients to build up
       sets of records selected by the user for future reference.


7.9    Full-text Features

       Hierarchical information sources are supported through
       table of contents access features.  Unfielded data is also
       supported for search, display and browse.


7.10   Extensibility

       Developers may freely add new vendor-unique messages to
       the protocol or new data elements to existing messages
       with no impact on other vendor's clients or servers.  They
       do not forfeit conformance by doing so, but they limit the




                                   11

       capabilities of their products when used in conjunction
       with other vendors' clients or servers.


7.11   Transfer Syntax

       The transfer syntax specified optimizes DXS for use in
       both networked and non-networked environments.  Although
       DXS is a message-oriented protocol, the transfer syntax
       allows byte-stream-oriented communication strategies.


7.12   Implementation Guidelines

       Details are given that allow developers from different
       organizations to build compliant systems on specific
       platforms without private agreements.  Developers that
       follow these guidelines have a much greater chance of
       achieving true interoperability and interface independence
       than if these elements were left unspecified.
       
       These are only some of the features in the DXS protocol,
       transfer syntax and implementation details.  For more
       detailed descriptions of these features and others, please
       refer to the respective documents.
































                                   12

                                
                            Glossary



       ADMINISTRATOR - The individual responsible for installing
               the client/server software and databases on the
               retrieval workstations or network.
       
       ASCII - A standard byte representation for character data
               understood by a large variety of computers.
       
       API -   Application Program Interface.  A set of services
               which can be called by an application program,
               usually with a publicly specified interface,
               intended to simplify the creation of applications
               for a given platform.
       
       ARP -   Address Resolution Protocol
       
       BYTE -  An 8-bit unsigned integer in 2's complement form
               with values ranging from 0 to 255.
       
       CACHE - To preserve a local copy of information optimized
               for frequent and fast reference, such as in random
               access memory.
       
       CLIENT - The program responsible for interpreting the end
               user's queries and displaying the results.  Also
               sometimes referred to as the "user interface" or
               "client application".  Client programs are
               platform specific.  The interface between the
               client and the end-user is not specified by DXS.
               DXS compatible clients may be character based,
               graphical user interfaces or whatever.  They may
               use any input or display device at their disposal
               including keyboards, pointing devices, touch
               screens and printers.  The client's responsibility
               is to map queries in the native context to
               standard requests as specified by the DXS
               protocol.  When responses are received, the
               client's responsibility is to map the returned
               results, which are returned in a standard format,
               to the native user interface's display format.
       
       CONVENTIONAL MEMORY - In MS-DOS, random access memory in
               the range 0 to 640 kilobytes.
       
       CURSOR - A working reference to a search set established
               by a client and managed by the server.
       
       DATAGRAM - In a connectionless-oriented protocol, a
               message from one source machine to one or more
               destination machines complete with addressing



                                   13

               information.  The analogy is the postal service in
               which each letter is a datagram, since each
               contains complete address information.
       
       DNS -   Domain Name Service
       
       DWORD - Double word, a 32-bit unsigned integer in 2's
               complement form with Motorola (big endian) byte
               ordering and with values ranging from 0 to
               4,294,967,296
       
       END USER - In this set of documents, the individual who is
               interacting with the client, presenting queries to
               the client and viewing results.
       
       EXPANDED MEMORY - In MS-DOS machines, random access memory
               above 1 MB referenced in conformance with the
               Lotus-Intel-Microsoft specification LIM-EMS.
       
       EXTENDED MEMORY - In MS-DOS machines, random access memory
               above 640KB addressable via device drivers.
       
       HOTLINK - A reference from one place in one record of a
               database to another location in the same or
               different database which establishes a
               relationship of significance between the two
               records.
       
       INFO-LIST - In a request, the specification by the client
               of the ordered set of return data elements the
               client is interested receiving in the response
               from the server.
       
       INTEROPERABILITY - The ability for any conforming DXS
               client to query any conforming DXS server with
               which it has the ability to communicate.
       
       IPC -   Interprocess communication.  One of potentially
               several messaging algorithms for two processes
               running in the same machine.
       
       IPX -   Internetwork Packet Exchange.  Novell Netware's
               network layer connectionless service.
       
       LOCAL DATABASE - A database under the control of a local
               server.
       
       LOCAL SERVER - A server program sharing the same machine
               as a client program.  In distinction from a
               network server, clients interact with local
               servers via interprocess communication.  See
               Network Server.
       




                                   14

       MAC ADDRESS - Media Access Control Address.  An OSI layer
               2 specification for the hardware address that can
               be used to uniquely identify each machine on a
               network.
       
       MARKUPCODES - Information inserted into appropriate
               locations of the returned text by the server for
               the benefit of the client to use in enhancing or
               formatting the returned information on display.
       
       NETBIOS - Network Basic Input/Output System.  A defacto
               standard API specified by IBM to allow peer-to-
               peer connectionless oriented communication between
               machines on a network.
       
       NETWORK SERVER - A server program running in a different
               machine from the client accessed over a network.
               Clients access network servers via a transport
               layer connection.  See Local Server.
       
       PACKET - An atomic unit of data sent from one machine to
               another.
       
       POSIX - Portable Operating System.
       
       REFERBACK - In a search strategy, the shorthand reference
               to the results of a previous search in the current
               search, usually intended to broaden or narrow the
               results.
       
       REQUEST - Typically, a message from the client to the
               server, conforming to the DXS transfer syntax,
               asking the server to perform a task.  For example,
               to open a database, or evaluate a query.  Some
               parts of the DXS protocol allow requests to be
               originated by the server and sent to the client.
       
       RESPONSE - Typically, a message from the server to the
               client, conforming to the DXS transfer syntax,
               containing the results of a client's request.
               Some parts of the DXS protocol allow responses to
               be sent by the client to the server (when the
               server has initialed a request).
       
       SENTENCE POSTING - An indexing technique which places
               sequential words from a sentence as a single
               unique entry in an index, with the intention of
               allowing the sentence to be efficiently searched
               as a unit.
       
       SERVER - The program responsible for accepting requests
               from the client and returning responses either via
               interprocess communication (for local servers) or
               via a transport connection (for network servers).



                                   15

               Servers control one or more databases which are in
               a format, perhaps proprietary, which is not
               addressed by the DXS specification.  Servers are
               therefore tightly bound to the databases they
               understand.  Their architecture is intended to
               insulate clients from file format, compression and
               indexing differences among a collection of
               databases from several sources.  Servers are
               platform specific and database specific,
               therefore, each DXS conforming database will be
               delivered with a collection of servers, one for
               each of the supported platforms.  DXS clients know
               how to find, load and unload, if necessary, the
               appropriate server program as an end-user switches
               from searching one database to another.
       
       
       SPAWN - The action of starting another process.
       
       SPX -   Sequenced Packet Exchange.  Novell Netware's
               connection oriented transport layer service.
       
       STOPWORD - A noise word excluded in an index for a field
               or database.
       
       STREAM-ORIENTED PROTOCOL - A protocol in which the bytes
               of the messages arrive one character at a time at
               their destination.  These messages are read by the
               receiving programs much in the same way as files
               are read in a character-oriented file system.
       
       TCP/IP - Transmission Control Protocol/Internet Protocol.
               A defacto standard communications protocol
               supported on a wide variety of machines.
       
       TRANSPORT CONNECTION - An OSI layer four connection which
               adds reliability, message division and
               reconstruction to the layer three protocol.  The
               intention of DXS messages is to be compatible with
               a variety of international standard and defacto
               standard transport services.  The functional
               interaction from the client and the server running
               on a network to their respective transport
               connections is outlined in the implementation
               details.
       
       UDP -   Unreliable Datagram Protocol. An efficient
               connectionless-oriented defacto standard protocol.
       
       WORD -  A 16-bit unsigned integer in 2's complement form
               with Motorola byte ordering (big endian) and with
               values ranging from 0 to 65,536.
       
       



                                   16

       
                            oooOOOooo























































                                   17
