Table Of ContentLinking Enterprise Data
David Wood
Editor
Linking Enterprise Data
Editor
David Wood
3 Round Stones LLC
22408 Fredericksburg Virginia
USA
[email protected]
ISBN 978-1-4419-7664-2 e-ISBN 978-1-4419-7665-9
DOI 10.1007/978-1-4419-7665-9
Springer New York Dordrecht Heidelberg London
© Springer Science+Business Media, LLC 2010
All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street,
New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis.
Use in connection with any form of information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they
are not identified as such, is not to be taken as an expression of opinion as to whether or not they are
subject to proprietary rights.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
”Allproblemsincomputersciencecan be
solvedbyanotherlayer ofindirection,but
thatwillusuallycreateanotherproblem.”
DavidJohnWheeler(1927 -2004)
Preface
LinkingEnterpriseDataisanewconcept,basedonanideamorethantwentyyears
old. Tim Berners-Lee’soriginalproposalfor the World Wide Web in March 1989
wasbasedonasystemoflinkedinformationsystems.TheearlyWebwasintended
tointerlinkinformationfromvarioussystemstosolveorganizationalproblems,such
asthe highturnoverofpeopleandthe restrictionofinformationto data silos. The
hope was to create a distributed information system that would allow ”a pool of
informationtodevelopwhichcouldgrowandevolvewiththeorganisationandthe
projectsitdescribes.”
TheWebhasgrownintotheworld’slargestinformationsystem.By2000,Web
architecture had been dissected and described by Roy Fielding. Representational
State Transfer (REST) was Roy’s answer to why the Web worked so well. In a
world plagued by software problems, machine crashes, and network outages, the
Webneverfails.TheWebisrobustandresilienttochange.TheWebsurviveschang-
ingmachinery,operatingsystemupdates,changesinthewaywestructureindexand
find information.No othersoftware system providesthe featuresand functionsof
WorldWideWeb.
LinkedDatatechniqueshavebecomeinterestingtoorganizationsofeveryshape
and size. The Linked Open Data (LOD) project began as a community effort of
theWorldWideWebConsortium’sSemanticWebEducationandOutreachGroup.
Theprojecthasbeguntoturnthedocument-orientedWebintoadatabaseofglobal
proportions.Theability ofthe modernWeb to dealwithbothdocumentsand data
haveshapeda generalsolutionforinformationdisseminationandintegration.The
timeforlinkingenterprisedatahascome.
This book records some of the earliest production applications of linking en-
terprise data. Parts of it serve as a roadmap for those seeking to replicate their
successes. PartI of this bookattemptsto answer the question whyenterprise data
should be linked. The chapters in Part I provide valuable guidance to those writ-
ingbusinesscases,forthoseneedingtojustifyinternaldevelopmentefforts,orfor
thosewritingrequestsforproposalstoexternalvendors.DeanAllemangdiscusses
whyenterprisesmustadoptWebtechniquesfordataintegrationandprovidessuch
techniques fit into enterprise systems. Dean makes a strong case that enterprises
vii
viii Preface
mustchangethewaytheyapproachinformationtechnologysystems.Indeed,since
information systems have such a profoundimpact on the operationalaspects of a
business,hemakesthecasethatenterprisesneedtochangethewaytheyapproach
theiroperations.
EdwardCurry,AndreFreitas,andSeanO’Riaindiscusstheroleofcommunity-
baseddatacuration.Enterpriseshavebecomemoredistributed,lesscentrallyman-
aged and less integratedin their systems. The lessons Ed and his colleagueshave
capturedfromreal-worldattemptstocuratedistributeddataforthepurposesofen-
suring data quality will apply to many enterprises. They provide some important
bestpracticesextractedfromearlyadoptersofLinkedDatatechniques.
Part II is short, butcritically important.Part II providesmaterial assistance for
business managers seeking to propose Linked Data projects. Bernadette Hyland
discusses the characteristics of enterprises ready to take on Linked Data projects
and providesusefulfodderfor business cases. Her simple guidelinesfor getting a
LinkedDataprojectstartedhavegenerallybeenlackinginthepublicdiscussionto
date.KristenHarris’real-worldexperiencescreatingandmanagingtheswoRDFish
projectatSunMicrosystemsdemonstratedthepotentialoflinkedenterprisedatato
integratedisparatesystemsinlargeenterprises.Sheprovidesguidanceforthenav-
igationofcorporatemanagementtoapproveandsupportprojectswithfar-reaching
infrastructuralramifications.
ThetechniquesofLinkedDatacanbesubtleandtechnical,althoughnotoutof
reachforthosewithtraditionalenterpriseskills.PartIIIprovidesthreeexplanatory
chaptersthataddressdifferenttechnicalaspectsoflinkingdata.AlexandrePassant,
PhilippeLaublet,JohnG.Breslin andStefanDeckerpresentwaystointegrateen-
terprisesocialnetworkingsolutionssuchaswikisandblogs.Initialenterpriseadop-
tion of new technologies can sometimes create new problems. Alexandre and his
co-authorsofferboth insightand solutions to the integrationof Web 2.0 and Web
3.0techniques.
Roberto Garcia and Rosa Gil demonstrate how the translation of existing data
sources may be brought to the Web of data. Reza B’Far and I offer technical ap-
proaches to enterprise problems of scale. Reza addresses logical reasoning tech-
niquesforenterprise-scaledataandIpresentwaystoensurethelong-termviability
ofLinkedDataidentifiers.
PartIV providesfive success stories from the frontline of enterpriseadoption.
Each story highlights a different aspect of Linked Data in an enterprise context.
ThomasBaker and JohannesKeizer addressstandardsforhighlydistributedoper-
ationsdevelopedfortheFoodandAgricultureOrganizationoftheUnitedNations.
Steve Harris, Tom Ilube and Mischa Tuffield show how Linked Data techniques
are used as the basis for their Web-scale company, Garlik. Constantine Hondros
of publisher Wolters Kluwer illustrates and presents several approaches for solv-
ingintegrationproblemsoftextualcontent.ChimezieOgbujidevelopsanewenter-
prise system using a LinkedData approachandYvesRaimond,Tom Scott, Silver
Oliver,PatrickSinclairandMichaelSmethurstoftheBritish BroadcastingCorpo-
rationpresenttheirinnovativecorporatetreatmentoftheWebitselfastheircontent
managementsystem.
Preface ix
We have been able to draw some tentative conclusionsregardingsuccess crite-
ria for LinkedData projectsin an enterprise.First and foremostmay be fromJeff
Pollock of Oracle Corporation when he said, ”If information systems are to keep
up with business, we need to change more than technology - we need to change
howpeopledealwith technology.”LinkedDatatechniquesofferusa meansto do
just that; we can radically change the interfaces to our existing systems while we
build upon them. We can wrap and expose our silos in order to layer a Web-like
distributedsystemoverthem.
Secondly, the lessons of the Web clearly apply to enterprises. The Web works
for some verygood,and very explainable,reasons. Those reasonstranscendRep-
resentational State Transfer (REST), the architectural principles that underlay the
idealized Web, and add the techniques of the Semantic Web, especially that sub-
setbeingusedbytheLinkedDatacommunity.Individualtechnologies,though,are
clearly less important than techniques that have proven their worth. Technologies
continuetoevolve;goodtechniquesaremoreresilientandworthbuildingupon.
Notehowdifferenttheorganizationsinthesuccessstoriesarefromoneanother!
Abroadcastingcompany,apublishingcompany,ahealthcareprovider,adatasecu-
rityfirm,aninternationalpolicyorganization.Otherchaptersreferencedothertypes
of organizations,including a utility company and library organizations.If Linked
Data techniqueswork for all of them, those techniquesare verylikely to applyto
others.
Allofoursuccessstorieshavesomeinterestingcommonalities:Atleastoneex-
pert in Semantic Web techniques was used by each organization.Each attacked a
significant business problem instead of relying on the technologiesto ”sell them-
selves”.Eachleveragedsignificantexistinginvestments,especiallythosewithcap-
tured or implied semantics. Every success relied upon universaladdressing of re-
sourcesviatheWeb’sUniformResourceIndicator(URI)scheme.
Therewerealsosomemajordifferencesbetweenthesuccessstories.Thosedif-
ferencesdefinetoolsandtechniquesthataremoresituationallydependent.Themost
noteworthy is that very different degrees of data modeling were employed. Com-
plete,top-downdatamodelingisexpensive,difficultandshouldbeundertakenonly
whereitprovidesvalue.Specifictechnologiestodescribedata(OWL,SKOS,RDF
serializationformats)variedwidely,asdidtheuseoftheSPARQLquerylanguage.
Trustmay be a largerissue in intra-businessdata than it is on the generalWeb
if business decisions are being made based on the information. Issues of trust in
large organizationsmay be facilitated by social considerations,e.g. via signing of
work,takingcreditforadditionsoredits,tyingcommentstologins.Manyoftoday’s
enterprises are large and distributed enough to make use of Web techniques for
buildingandmaintainingtrustsociallyoveratechnicalframework.
The fourparts of this bookare presentedhierarchically,like mostbooksin the
last2300yearsofWesterntradition.Thematerialinthisbookshouldnotbethought
ofasahierarchy,butratherlikeagraph,liketheWebitself.Allofthechaptersin
thisbookcontainnuggetsofinformationusefultoenterpriseprofessionalslooking
toapplyLinkedDatatechniques.Theopeningchaptersdoaddresstechnologyand
successstoriesaswellaslayingandconceptualfoundation.Thetechniquechapters