Table Of ContentThe Architectural Logic of Database Systems
J.
E. Yannakoudakis
The Architectural
Database
Logic of
Systems
With 69 Figures
Springer-Verlag
london Berlin Heidelberg New York
Paris Tokyo
E. 1. Yannakoudakis, BSc, PhD, CEng, FBCS
Postgraduate School of Computer Sciences, University of Bradford,
Bradford, West Yorkshire BD71DP, UK
ISBN-13 :978-3-540-19513-9 e-ISBN-13 :978-1-4471-1616-5
DOT: 10.1007/978-1-4471-1616-5
British Library Cataloguing in Publication Data
Yannakoudakis, E.J .. 1950-
Architectural logic of database systems
1. Machine -readable files. Design
I. Title
005.74
Library of Congress Cataloging-in-Publication Data
Yannakoudakis. E. J .. 1950-
The architectural logic of database systems.
Includes bibliographies and index.
1. Data base management. 2. Computer architecture.
I. title.
QA76.9.D3Y361988005.74 88-3248
This work is subject to copyright. All rights are reserved. whether the whole or part of
the material is concerned. specifically the rights of translation. reprinting. reuse of
illustrations. recitation. broadcasting. reproduction on microfilms or in other ways. and
storage in data banks. Duplication of this publication or parts thereof is only permitted
under the provisions of the German Copyright Law of September 9.1965, in its version
of June 24. 1985. and a copyright fee must always be paid. Violations fall under the
prosecution act of the German Copyright Law.
©Springer-Verlag Berlin Heidelberg 1988
The use of registered names. trademarks etc. in this publication does not imply. even in
the absence of a specific statement, that such names are exempt from the relevant laws
and regulations and therefore free for general use.
Filmset by Saxon Printing Limited. Saxon House. Derby
Printed by Page Bros (Norwich) Limited. Mile Cross Lane. Norwich.
2128/3916--543210
To Eve, John, Irene and Helen
who involuntarily allowed me to finish this book.
Preface
If we look back to pre-database systems and the data units which were
in use, we will establish a hierarchy starting with the concept of 'field'
used to build 'records' which were in turn used to build higher data
units such as 'files'. The file was considered to be the ultimate data unit
of information processing and data binding 'monolith'. Moreover, pre
database systems were designed with one or more programming
languages in mind and this in effect restricted independent develop
ment and modelling of the applications and associated storage
structures.
Database systems came along not to turn the above three units into
outmoded concepts, but rather to extend them further by establishing
a higher logical unit for data description and thereby offer high level
data manipulation functions. It also becomes possible for computer
professionals and other users to view all information processing needs
of an organisation through an integrated, disciplined and methodical
approach.
So, database systems employ the concepts field, record and file
without necessarily making them transparent to the user who is in
effect offered a high level language to define data units and relation
ships, and another language to manipulate these. A major objective of
database systems is to allow logical manipulations to be carried out
independent of storage manipulations and vice versa.
A rather accurate parallel between database systems and high level
languages such as FORTRAN, COBOL and Pascal can be drawn here
by stating that database systems form a natural progressive step from
file systems in the way that high level languages form a natural
progressive step from assembly or low level languages. The Data Base
Management System (DBMS) is the software necessary to set up,
manipulate and maintain the object database, that is, the data of the
organisation including appropriate control information.
Since the establishment of this higher concept and its acceptance by
the computer community as the next step towards an even more
advanced information processing environment, the market has been
stocked with a plethora of books on the subject, its implications and
application environments. However, few books are available for the
person who has elementary knowledge of programming and who
wishes to have a general introduction to database principles and at the
viii Preface
same time acquire knowledge of current database management system
software and the various levels at which it is utilised, independent of
any vendor-related software. The present book tackles this and also
discusses the database environment under the following major areas:
(a) The logic behind database systems
(b) The architecture of database systems and related software
(c) How an entire organisation can be viewed with the aid of
appropriate database software
(d) How data can be defined and manipulated using database
languages as well as natural language
(e) Models which can describe an organisation accurately
(f) Database design methodologies and techniques to bind record
types together
(g) Potential administrative and technical tasks to be performed.
A recent development has been the automation of database design
following analyses of the different 'views' (applications) users may
have of the same centralised data. The technique used is termed
'canonical database synthesis', where all user views are merged into a
single unit which reflects the inherent structure of organisational data.
The ultimate objective is to aid the design of the 'logical database
structure' which:
(a) Is free from duplication
(b) Is optimised to reflect the organisation accurately
(c) Does not depend on specific vendor-software
(d) Can satisfy new applications without major restructuring
Canonical synthesis is presented here in a step-wise fashion with
examples that illustrate the merging, analysis and grouping of user
views to form closely related clusters of data elements. An algorithm
for canonical sythesis has been implemented in Pascal and this is used
to analyse a complete hospital database environment for a Regional
Health Authority. Although the software we developed is not included
in the book, the Appendix contains example reports it produces for the
hospital database, starting with the input of user views followed by
their processing and finally the design of the complete logical schema.
The American National Standards Institute (ANSI) and the Inter
national Standards Organisation (ISO) have adopted a new standard
Relational Database Language (RDL) and a Network Database
Language (NDL). Both RDL and NDL are presented here in their
current form of development with our own extensions where appropri
ate, particularly for the definition of the storage structures. The book
contains the syntax of the most important RDL and NDL commands
for data definition and data manipulation functions. They are illus
trated with simple examples that show the input, the statements the
user actually types, and the end result.
The ultimate objective of this book is to demystify database
concepts and methodologies and at the same time explain in as simple a
manner as possible, three important approaches to defining relation
ships among attributes: the 'hierarchical', 'network' and 'relational'.
Preface IX
The emphasis is on the relational and network approaches to database
management because they appear to be suitable for most data
processing applications. Besides, we have not seen nor are we likely to
see an international standard for the hierarchical approach to database
design.
The material includes ample examples and realistic attributes and
relationships among these. It is presented in a rather laconic and
synoptic fashion by avoiding unnecessary long introductions to the
various concepts and by making direct, factual and precise statements
on 'what it is', 'how it works' and 'how it can be applied'. The material
can in fact be split into four parts:
Part I The database environment and underlying data models
(Chapters 1,2 and 3).
Part II The architecture of database software and man-machine
communication (Chapters 4 and 5).
Part III Database design methodology (Chapters 6 and 7).
Part IV The relational and network database architectures (Chap
ters 8 and 9).
The concepts and method of presentation are completely indepen
dent of hardware and commercial software packages. The person who
masters the material presented here will be in a strong position to
judge and evaluate any type of database software, regardless of
whether this is offered on a micro, mini or large mainframe. Moreover,
the ISO database language discussed here will provide a yardstick for
comparative assessment for some years to come.
The book will be useful to all people who wish to acquire a working,
sound and up to date knowledge on the subject, its terminology and
method of application. Since it does not require any a priori knowledge
of database systems but only simple programming principles, it is
recommended to students taking A-level courses in computer studies,
University undergraduates, postgraduates who may wish to use the
DBMS as a tool for data manipulation, computer programmers who
are about to commence programming under a DBMS, systems
analysts who may wish to assess the feasability of introducing a DBMS
into their organisation, and database administrators who wish to
acquire sufficient and integrated knowledge on logical database
architecture, associated structures and software modules, and techni
cal tasks behind setting up and maintaining a database. Finally, data
processing managers will find the material useful, particularly the
terminological dictionary; most importantly, though, they will be able,
identify and establish appropriate administrative posts for the effective
maintenance of a database.
Acknowledgements
I would like to thank Chris Stoker (University of Bradford) for the
assistance he has given me in the production of the canonical synthesis
x Preface
reports for the hospital database presented in the Appendix. I would
like particularly to thank Chi Pui Cheng (Hong Kong Polytechnic) for
his invaluable comments in shaping up the chapter on RDL and NDL.
January 1988 E.J.Y.
Contents
1 Foundations of Databases............................................ 1
1.1 Data and Information..... . . . .... .... ... .. .. ..... .... . .... . . . . . ... ... 1
1.2 Program and File Communication ................................ 3
1.3 Program and Meta-file Communication......................... 7
1.4 Towards a Database System........................................ 9
1.5 High Level Database Software..................................... 13
1.6 Summary................................................................ 16
1. 7 References.............................................................. 16
2 The Logic of the Database Environment ......................... 17
2.1 The Principle of Data Independence .......................... 17
2.1.1 Physical Independence...................................... 18
2.1.2 Logical Independence ....................................... 19
2.2 Standard Software and the Database............................. 20
2.3 Three Architectural Levels... ...... ....... ..... ..... ......... ... ... 23
2.3.1 Logical Schema ............................................... 25
2.3.2 Logical Subschema........................................... 25
2.3.3 Internal Schema............................................... 26
2.4 Types of Users ......................................................... 27
2.4.1 Database Administrator (DBA) .......................... 28
2.4.2 System Software Engineer (SSE) ......................... 28
2.4.3 Applications Analyst ........................................ 29
2.4.4 Applications Programmer.................................. 29
2.4.5 General User.................................................. 29
2.5 Summary................................................................ 29
2.6 References .............................................................. 30
3 Data Structures and Data Models.......... ... ... . .. .. . ....... .... . 31
3.1 Introduction ............................................................ 31
3.2 Data Structures and Relationships ....... ..... ... ........... ... ... 34
3.2.1 Data Structures on Keys .................................... 35
3.2.2 Tree Structures................................................ 40
xii Contents
3.3 Hierarchic Data Models............................................. 43
3.4 Network Data Models............................................... 45
3.5 Relational Data Models............................................. 52
3.5.1 Relational Terminology... ...... ..... ....... ....... ......... 53
3.5.2 Basic Characteristics of Relational Models ...... .... ... 54
3.6 An Example Schema Model........................................ 56
3.7 An Example Subschema Model ................................... 59
3.8 Summary................................................................ 60
3.9 References ............ .... ... ..... ..... ..... ..... ....... ........ ........ 61
4 The Architecture of Database Software .......................... 63
4.1 Introduction............................................................ 63
4.1.1 Data Types and Qualifiers.... . ... . . . . . . ...... ..... .. . .... .. 64
4.2 Data Description Language (DDL) .............................. 66
4.2.1 RDL Commands.............................................. 67
4.3 Data Manipulation Language (DML)............................ 73
4.3.1 RDL Commands.............................................. 74
4.4 Data Storage Description Language (DSDL) .................. 87
4.4.1 RDL Commands.............................................. 89
4.5 Query Language. . . .. ... . ... ... ..... .. . . ... . ... ........ ........ .... .... 92
4.5.1 RDL Commands.............................................. 94
4.6 Query By Example (QBE).......................................... 94
4.6.1 Example Forms of QBE .................................... 95
4.7 Data Dictionary....................................................... 97
4.7.1 Aims and Objectives......................................... 97
4.7.2 The Data Dictionary and the Database ................. 100
4.8 An Overview of Software Integration............................ 104
4.9 Summary................................................................ 107
4.10 References ............................................................ 108
5 Communicating with Databases in Natural Language ........ 109
5.1 Programming Languages ............................................ 109
5.2 The PROLOG Programming Language......................... 110
5.3 Natural Language System Architecture.......................... 113
5.3.1 The Language PROLOG and the Database ........... 117
5.3.2 Conclusions and Further Research ....................... 125
5.4 Communicating with Databases by Voice... .... ................ 125
5.5 Speech Synthesis...................................................... 126
5.6 Speech Recognition .................................................. 128
5.7 An Integrated View of Man-Machine Interfaces.... ..... .. ... 130
5.8 Summary................................................................ 132
5.9 References.............................................................. 133