Table Of ContentDatabase Modelling in UML
By Geoffrey Sparks, [email protected] : http://www.sparxsystems.com.au
Originally published in Methods & Tools e-newsletter : http://www.martinig.ch/mt/index.html
Introduction Data Profile (as proposed by Rational
Software).
When it comes to providing reliable, Some familiarity with object-oriented
flexible and efficient object persistence design, UML and relational database
for software systems, today's designers modelling is assumed.
and architects are faced with many
choices. From the technological The Class Model
perspective, the choice is usually
between pure Object-Oriented, Object- The Class Model in the UML is the
Relational hybrids, pure Relational and main artefact produced to represent the
custom solutions based on open or logical structure of a software system.
proprietary file formats (eg. XML, It captures the both the data
OLE structured storage). From the requirements and the behaviour of
vendor aspect Oracle, IBM, Microsoft, objects within the model domain. The
POET and others offer similar but techniques for discovering and
often-incompatible solutions. elaborating that model are outside the
scope of this article, so we will assume
This article is about only one of those the existence of a well designed class
choices, that is the layering of an model that requires mapping onto a
object-oriented class model on top of a relational database.
purely relational database. This is not
to imply this is the only, best or The class is the basic logical entity in
simplest solution, but pragmatically it the UML. It defines both the data and
is one of the most common, and one the behaviour of a structural unit. A
that has the potential for the most class is a template or model from
misuse. which instances or objects are created
at run time. When we develop a logical
We will begin with a quick tour of the model such as a structural hierarchy in
two design domains we are trying to UML we explicitly deal with classes.
bridge: firstly the object-oriented class
model as represented in the UML, and When we work with dynamic
secondly the relational database model. diagrams, such as sequence diagrams
and collaborations, we work with
For each domain we look only at the objects or instances of classes and their
main features that will affect our task. inter-actions at run-time.
We will then look at the techniques
and issues involved in mapping from The principal of data hiding or
the class model to the database model, encapsulation is based on localisation
including object persistence, object of effect. A class has internal data
behaviour, relationships between elements that it is responsible for.
objects and object identity. We will Access to these data elements should
conclude with a review of the UML be through the class's exposed
behaviour or interface. Adherence to
Person Person Class attributes :
the encapsulated
- Address: CAddress
data
# Name:
# ASgtei: double
+ getAge() : int
+ setAge(n)
+ getName() :
+ sSettiName(s)
A simple person class
with no state or Class operations:
behaviour shown the behaviour
Attributes and operations define the state of an object
rutn-time and the capabilities or behaviour of the
bj t
Figure 1 - Classes, attributes and operations
this principal results in more relationship that is most interesting: for
maintainable code. example an Address class may be
associated with a Person class. The
Behaviour mapping of this relationship into the
relational data space requires some
Behaviour is captured in the class care.
model using the operations that are
defined for the class. Operations may Aggregation is a form of association
be externally visible (public), visible to that implies the collection of one class
children (protected) or hidden of objects within another. Composition
(private). By combining hidden data is a stronger form of aggregation that
with a publicly accessible interface and implies one object is actually
hidden or protected data manipulation, composed of others. Like the
a class designer can create highly association relationship, this implies a
maintainable structural units that complex class attribute that requires
support rather than hinder change. careful consideration in the process of
mapping to the relational domain.
Relationships and Identity
While a class represents the template
Association is a relationship between 2 or model from which many object
classes indicating that at least one side instances may be created, an object at
of the relationship knows about and run time requires some means of
somehow uses or manipulates the other identifying itself such that associated
side. This relationship may by objects may act upon the correct object
functional (do something for me) or instance. In a programming language
structural (be something for me). For like C++, object pointers may be
this article it is the structural passed around and held to allow
objects access to a unique object with which to locate or re-create the
instance. required object.
Often though, an object will be
destroyed and require that it be re- The Relational Model
created as it was during its last active
instance. These objects require a The relational data model has been
storage mechanism to save their around for many years and has a
internal state and associations into and proven track record of providing
to retrieve that state as required. performance and flexibility. It is
essentially set based and has as its
Inheritance provides the class model fundamental unit the 'table', which is
with a means of factoring out common composed of a set of one or more
behaviour into generalised classes that 'columns', each of which contains a
then act as the ancestors of many data element.
variations on a common theme.
Inheritance is a means of managing Tables and Columns
both re-use and complexity. As we will
see, the relational model has no direct A relational table is collection of one
counterpart of inheritance, which or more columns each of which has a
creates a dilemma for the data unique name within the table construct.
modeller mapping an object model Each column is defined to be of a
onto a relational framework. certain basic data type, such as a
number, text or binary data. A table
Navigation from one object at run time definition is a template from which
to another is based on absolute table rows are created, each row being
references. One object has some form an instance of a possible table instance.
of link (a pointer or unique object ID)
A class hierarchy Person Aggregation captures the concept Family
showing a generalised ocofllection or composition between classes
person class from
which other classes
are derived
Parent Child The main relationships we are
interested in are Association,
Aggregation and Inheritance.
2
1..n Tdehsecsreibe the ways classes interact
Association captures
or relate to each
a having or using
other
relationship between
classes
Figure 2 - UML Class model notation
Share
A Person may
own a set of
Shares
A Person is composed of
a strict set of ID
A Person may 0..n documents (having n
reside at zero elements)
or more
addresses 0..1
Address Person ID Document
0..n 0..n
1 n
An Address may Three forms of the Aggregation relationship. The weak form is
have zero or depicted with an unfilled diamond head, the strong form
more Persons in (composition) with a filled head.
residence
Figure 3- Aggregation Relationships
Public Data Access Database stored procedures provide a
means of extending database
The relational model only offers a functionality through proprietary
public data access model. All data is language extensions used to construct
equally exposed and open to any functional units (scripts). These
process to update, query or manipulate functional procedures do not map
it. Information hiding is unknown. directly to entities, nor have a logical
relationship to them.
Behaviour
Navigation through relational data sets
The behaviour associated with a table is based on row traversal and table
is usually based on the business or joins. SQL is the primary language
logical rules applied to that entity. used to select rows and locate
instances from a table set.
Constraints may be applied to columns
in the form of uniqueness Relationships and Identity
requirements, relational integrity
constraints to other tables/rows, The primary key of a table provides the
allowable values and data types. unique identifying value for a
particular row. There are two kinds of
Triggers provide some additional primary key that we are interested in:
behaviour that can be associated with firstly the meaningful key, made up of
an entity. Typically this is used to data columns which have a meaning
enforce data integrity before or after within the business domain, and
updates, inserts and deletes. second the abstract unique identifier,
such as a counter value, which have no
business meaning but uniquely identify Having looked at the two domains of
a row. We will discuss this and the interest and compared some of the
implications of meaningful keys later. important features of each, we will
digress briefly to look at the notation
A table may contain columns that map proposed to represent relational data
to the primary key of another table. models in the UML.
This relationship between tables
defines a foreign key and implies a The UML Data Model Profile
structural relationship or association
between the two tables. The Data Model Profile is a proposed
UML extension (and currently under
Summary review - Jan 2001) to support the
modelling of relational databases in
From the above overview we can see UML. It includes custom extensions
that the object model is based on for such things as tables, data base
discrete entities having both state schema, table keys, triggers and
(attributes/data) and behaviour, with constraints. While this is not a ratified
access to the encapsulated data extension, it still illustrates one
generally through the class public possible technique for modelling a
interface only. The relational model relational database in the UML.
exposes all data equally, with limited
support for associating behaviour with Tables
data elements through triggers, indexes
and constraints.
Customer
You navigate to distinct information in
the object model by moving from
object to object using unique object
identifiers and established object
relationships (similar to a network
A table in the UML Data Profile is a
data model). In the relational model
class with the «Table» stereotype,
you find rows by joining and filtering
displayed as above with a table icon in
result sets using SQL using generalised
the top right corner.
search criteria.
Columns
Identity in the object model is either a
run-time reference or persistent unique
ID (termed an OID). In the relational
Customer
world, primary keys define the PK OID: int
uniqueness of a data set in the overall Name: VARCHAR2
Address: VARCHAR2
data space.
In the object model we have a rich set
of relationships: inheritance,
Database columns are modelled as
aggregation, association, composition,
attributes of the «Table» class. For
dependency and others. In the
example, the figure above shows some
relational model we can really only
attributes associated with the Customer
specify a relationship using foreign
table. In the example, an object id has
keys.
been defined as the primary key, as
well as two other columns, Name and
Customer
Address. Note that the example above
PK OID: int
defines the column type in terms of the
Name: VARCHAR2
native DBMS data types. Address: VARCHAR2
+ «PK» idx_customer00()
Behaviour + «FK» idx_customer02()
+ «Index» idx_customer01()
+ «Trigger» trg_customer00()
So far we have only defines the logical + «Unique» unq_customer00()
+ «Proc» spUpdateCustomer()
(static) structure of the table; in + «Check» chk_customer00()
addition we should describe the
behaviour associated with columns,
including indexes, keys, triggers, The example illustrates the following
procedures & etc. Behaviour is possible behaviour:
represented as stereotyped operations.
1. A primary key constraint (PK);
The figure below shows our table 2. A Foreign key constraint (FK);
above with a primary key constraint 3. An index constraint (Index);
and index, both defined as stereotyped 4. A trigger (Trigger);
operations: 5. A uniqueness constraint (Unique);
6. A stored procedure (Proc) - not
formally part of the data profile,
Customer but an example of a possible
PK OID: int modelling technique; and a
Name: VARCHAR2
Address: VARCHAR2 7. Validity check (Check).
+ «PK» idx_customer00()
+ «index» idx_customer01() Using the notation provided above, it is
possible to model complex data
structures and behaviour at the DBMS
level. In addition to this, the UML
Note that the PK flag on the column
provides the notation to express
'OID' defines the logical primary key,
relationships between logical entities.
while the stereotyped operation "«PK»
idx_customer00" defines the
Relationships
constraints and behaviour associated
with the primary key implementation
The UML data modelling profile
(that is, the behaviour of the primary
defines a relationship as a dependency
key).
of any kind between two tables. It is
represented as a stereotyped
Adding to our example, we may now
association and includes a set of
define additional behaviour such as
primary and foreign keys.
triggers, constraints and stored
procedures as in the example below:
The data profile goes on to require that
a relationship always involves a parent
and child, the parent defining a
primary key and the child
implementing a foreign key based on
all or part of the parent primary key.
The relationship is termed 'identifying'
if the child foreign key includes all the
elements of the parent primary key and hardware (a 'node' in UML).
'non-identifying' if only some elements
of the primary key are included. To represent schema within the
database, use the «schema» stereotype
The relationship may include on a package. A table may be placed in
cardinality constraints and be modelled a «schema» to establish its scope and
with the relevant PK - FK pair named location within a database.
as association roles. Figure 4 illustrates
this kind of relationship modelling
«schema»
using UML.
User
Child
Grandchild
The Physical Model Grandparent
Parent
UML also provides some mechanisms Person
for representing the overall physical
structure of the database, its contents
and deployed location. To represent a
physical database in UML, use a
stereotyped component as in the figure Mapping from the Class Model to
below: the Relational Model
Having described the two domains of
«Database»
MainOraDB interest and the notation to be used, we
can now turn our attention as to how to
map or translate from one domain to
the other. The strategy and sequence
presented below is meant to be
A component represents a discrete and
suggestive rather than proscriptive -
deployable entity within the model. In
adapt the steps and procedures to your
the physical model, a component may
personal requirements and
be mapped on to a physical piece of
Parent Child
PK_PersonID FK_PersonID
2
«identifying» 0..n
An identifying relationship between child and parent, with role names
based on primary to foreign key relationship.
Figure 4 - UML relationship
environment. construct from the object-oriented
model that requires translating into the
1. Model Classes relational model. The relational space
is essentially flat, every entity being
Firstly we will assume we are complete in its self, while the object
engineering a new relational database model is often quite deep with a well-
schema from a class model we have developed class hierarchy.
created. This is obviously the easiest
direction as the models remain under The deep class model may have many
our control and we can optimise the layers of inherited attributes and
relational data model to the class behaviour, resulting in a final, fully
model. In the real world it may be that featured object at run-time. There are
you need to layer a class model on top three basic ways to handle the
of a legacy data model - a more translation of inheritance to a relational
difficult situation and one that presents model:
its own challenges. For the current
discussion will focus on the first 1. Each class hierarchy has a single
situation. At a minimum, your class corresponding table that contains
model should capture associations, all the inherited attributes for all
inheritance and aggregation between elements - this table is therefore the
elements. union of every class in the
hierarchy. For example, Person,
2. Identify persistent objects Parent, Child and Grandchild may
all form a single class hierarchy,
Having built our class model we need and elements from each will appear
to separate it into those elements that in the same relational table;
require persistence and those that do 2. Each class in the hierarchy has a
not. For example, if we have designed corresponding table of only the
our application using the Model-View- attributes accessible by that class
Controller design pattern, then only (including inherited attributes). For
classes in the model section would example, if Child is inherited from
require persistent state. Person only, then the table will
contain elements of Person and
3. Assume each persistent class maps Child only;
to one relational table 3. Each generation in the class
hierarchy has a table containing
A fairly big assumption, but one that only that generation's actual
works in most cases (leaving the attributes. For example, Child will
inheritance issue aside for the map to a single table with Child
moment). In the simplest model a class attributes only
from the logical model maps to a
relational table, either in whole or in There are cases to be made for each
part. The logical extension of this is approach, but I would suggest the
that a single object (or instance of a simplest, easiest to maintain and less
class) maps to a single table row. error prone is the third option. The first
option provides the best performance
4. Select an inheritance strategy at run-time and the second is a
compromise between the first and last.
Inheritance is perhaps the most
problematic relationship and logical
The first option flattens the hierarchy be replicated in many child tables.
and locates all attributes in one table - Even worse, the parental data in two or
convenient for updates and retrievals more child classes may be redundantly
of any class in the hierarchy, but stored in many tables; if a parent's
difficult to authenticate and maintain. attributes are modified, there is
Business rules associated with a row considerable effort in locating
are hard to implement, as each row dependent children and updating the
may be instantiated as any object in the affected rows.
hierarchy. The dependencies between
columns can become quite The third option more accurately
complicated. In addition, an update to reflects the object model, with each
any class in the hierarchy will class in the hierarchy mapped to its
potentially impact every other class in own independent table. Updates to
the hierarchy, as columns are added, parents or children are localised in the
deleted or modified from the table. correct space. Maintenance is also
relatively easier, as any modification
The second option is a compromise of an entity is restricted to a single
that provides better encapsulation and relational table also. The down side is
eliminates empty columns. However, a the need to re-construct the hierarchy
change to a parent class may need to at run-time to accurately re-create a
Parent tbl_Parent
A Parent class with unique ID (OID)
- OID: GUID and Name and Sex attributes maps to AddressOID: VARCHAR
# Name: String a relational table. Name: VARCHAR
# Sex: Gender PK OID: VARCHAR
Sex: VARCHAR
+ setName(String)
+ getName() : String
+ setSex(String) <<realises>>
+ getSex() : String
m_Address 0..n
The Address association from the logical model becomes
a foreign key relationship in the data model
1
Address
- OID: GUID
# City: String The Address class in the logical tbl_Address
# Phone: String model becomes a table in the City: VARCHAR
# State: String data model PK OID: VARCHAR
# Street: String Phone: VARCHAR
State: VARCHAR
+ getCity() : String
Street: VARCHAR
+ getStreet() : String
+ setCity(String) <<realises>>
+ setStreet(String)
Figure 5 - Class to Table mapping
child class's state. A Child object may A complete run-time scenario can then
require a Person member variable to be loaded from storage reasonably
represent their model parentage. As efficiently.
both require loading, two database
calls are required to initialise one An important point about the OID
object. As the hierarchy deepens, with values above is that they have no
more generations, the number of inherent meaning beyond simple
database calls required to initialise or identity. They are only logical pointers
update a single object increases. and nothing more. In the relational
model, the situation is often quite
It is important to understand the issues different.
that arise when you map inheritance
onto a relational model, so you can Identity in the relational model is
decide which solution is right for you. normally implemented with a primary
key. A primary key is a set of columns
in a table that together uniquely
5. For each class add a unique identify a row. For example, name and
object identifier address may uniquely identify a
'Customer'. Where other entities, such
In both the relational and the object as a 'Salesperson', reference the
world, there is the need to uniquely 'Customer', they implement a foreign
identify an object or entity. key based on the 'Customer' primary
key.
In the object model, non-persistent
objects at run-time are typically The problem with this approach for our
identified by direct reference or by a purposes is the impact of having
pointer to the object. Once an object is business information (such as customer
created, we can refer to it by its run- name and address) embedded in the
time identity. However, if we write out identifier. Imagine three or four tables
an object to storage, the problem is all have foreign keys based on the
how to retrieve the exact same instance customer primary key, and a system
on demand. change requires the customer primary
key to change (for example to include
The most convenient method is to 'customer type'). The work required to
define an OID (object identifier) that is modify both the 'customer' table and
guaranteed to be unique in the the entities related by foreign key is
namespace of interest. This may be at quite large.
the class, package or system level,
depending on actual requirements. On the other hand, if an OID was
implemented as the primary key and
An example of a system level OID formed the foreign key for other tables,
might be a GUID (globally unique the scope of the change is limited to
identifier) created with Microsoft's the primary table and the impact of the
'guidgen' tool; eg. {A1A68E8E-CD92- change is therefore much less.
420b-BDA7-118F847B71EB}. A class
level OID might be implemented using Also, in practice, a primary key based
a simple numeric (eg. 32 bit counter). on business data may be subject to
change. For example a customer may
If an object holds references to other change address or name. In this case
objects, it may do so using their OID. the changes must be propagated