Table Of ContentAdvanced Digital Preservation
David Giaretta
Advanced
Digital
Preservation
123
DavidGiaretta
STFCandAllianceforPermanentAccess
Yetminster,Dorset
UnitedKingdom
[email protected]
FurtherProjectInformationandOpenSourceSoftwareunder:
http://www.casparpreserves.eu
http://developers.casparpreserves.eu
http://www.alliancepermanentaccess.org
ISBN978-3-642-16808-6 e-ISBN978-3-642-16809-3
DOI10.1007/978-3-642-16809-3
SpringerHeidelbergDordrechtLondonNewYork
LibraryofCongressControlNumber:2011921005
ACMCodesH.3,K.4,K.6
©Springer-VerlagBerlinHeidelberg2011
Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis
concerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting,
reproductiononmicrofilmorinanyotherway,andstorageindatabanks.Duplicationofthispublication
orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,
1965,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violations
areliabletoprosecutionundertheGermanCopyrightLaw.
Theuseofgeneraldescriptivenames,registerednames,trademarks,etc.inthispublicationdoesnot
imply, even in the absence of a specific statement, that such names are exempt from the relevant
protectivelawsandregulationsandthereforefreeforgeneraluse.
Coverdesign:deblik,Berlin
Printedonacid-freepaper
SpringerispartofSpringerScience+BusinessMedia(www.springer.com)
“Howtopreserveallkindsofdigitalobjects”
and
“OAIS:whatitmeansandhowtouseit”
and
“TheCASPARbook”
and
“Everythingyouwantedtoknowaboutdigital
preservationbutwereafraidtoask”
Preface
There has been a growing recognition of the need to address the fragility of the
digital information that is deluging all aspects of our lives, whether in business,
scientific,administrative,imaginativeorculturalactivities.
Society’s growing dependence on the digital for its smooth operation as it
becomesaninformationsocietyprovidestherealurgencyforaddressingthisissue.
Thiscasehasbeenmadeverywellinthelargenumberofbooksandarticlesalready
published on the topic of digital preservation and therefore this case will not be
expandeduponinthisbook.
Sincetherearemanybooksaboutdigitalpreservationwhyisthereaneedforyet
onemore?
Atthetimeofwritingthebooksandarticlesondigitalpreservation,forthemost
part,focusonconsiderationofdocuments,imagesandwebpages;thingswhichare
normally just displayed by software for a human to view or listen to (or perhaps
smell,tasteortouch).Wewillrefertotheseasthingswhicharerendered.
Yetthereareclearlymanymoretypesofdigitalobjectsonwhichourlivesdepend
andwhichmayneedtobepreserved,suchasdatabases,scientificdataandsoftware
itself.Thesearethingswhicharenotsimplyrendered–theyareprocessedandused
inmanydifferentways.
It should become clear to the reader that the tools and techniques used for pre-
serving rendered objects are inadequate for all these other types of digital objects
and we need to set our sights higher and wider. This book provides the concepts,
techniquesandtoolswhichareneeded.
Of course it is easy to make claims about digital preservation techniques – and
there are many such claims! Therefore it is important that evidence is provided to
supportanysuchclaims,whichwedoforourclaimsbyusingacceleratedlifetime
scenariosabouttheimportantchangeswhichwillchallengeus.Weuseasexamples
a variety of digital objects from many sources and show tools and techniques by
whichtheymaybepreserved.
vii
viii Preface
1 WhoShouldReadThisBookandWhy?
This book is aimed at those who have problems in preserving digitally encoded
information that they need to solve, especially where it goes beyond simply pre-
serving rendered objects. The PARSE.Insight survey [1] suggests that while all
researchershavedocumentsandimages,abouthalfhavenon-rendereddigitalhold-
ings such as raw data, scientific/statistical data, databases and software, therefore
thisbookshouldbeofwideinterest.
Itshouldalsobeessentialreadingforthosewhowishtoaudittheirownarchives,
perhaps in advance of an independent audit, about how well they are doing in the
preservationofthedigitallyencodedinformationwhichhasbeenentrustedtothem.
Researchersindigitalpreservationtheoryanddevelopersoftoolsandtechniques
shouldalsofindvaluableinformationhere.Developersintheareaofe-Science(also
knownasCyberinfrastructure)mayalsogainanumberofusefulinsights.
Some of the material in this book may be found to be too technical by some
readers. For those readers we suggest that they skim over such material in order
to at least be aware of the issues. This will allow them to advise more technical
implementerswhowillcertainlyneedsuchdetails.
Tofurtherhelpreaders,thebookissupportedbyotherresources,includingmany
hoursofvideosandpresentationsfromtheCASPARproject[2],whichprovides❍
an elevator pitch for digital preservation, ❍ examples of digital preservation from
severalrepositories,❍ detailedlecturesbythecontributorstothisbookonmanyof
theissuesdescribedhereand❍ lecturesabout,andvideocapturesof,manyofthe
softwarecomponents.Theopensourcesoftwareandfurtherdocumentationisalso
available.
2 StructureofThisBook
Part I of the book provides the concepts and theoretical basis that are needed,
introducing, as examples along the way, digital objects from many sources. Since
much of this book is based on the work of the CASPAR project, the examples
will be derived from many disciplines including science, cultural heritage and
contemporaryperformingarts.
The approach we take throughout is one of asking the questions which we
believeareasonablyintelligentpersonmayask,andthenprovidinganswerstothem.
Sometimes, when there are some subtle but important points, we guide the reader
towardstheappropriatequestions.Asnotedabove,thiswillleadusintoanumber
of technical issues which will not be to the taste of all readers but all topics are
necessaryforatleastsomereaders.
PartIIofthebookshowspracticalexamplesofpreservingavarietyofspecific
objectsandgivesdetailsofarangeoftoolsandtechniques.Oneobviousquestion,
which an intelligent (but sceptical) reader may ask is “these tools and techniques
may do something but why should I believe that they help to preserve things?”
Preface ix
Afterall,theonlyrealwaywouldbetolivealongtimeandcheckthesupposedly
preservedobjectsinthefuture.Howeverthatisnotverypractical,andperhapsmore
importantlyitdoesnothelponetodecidenowwhethertofollowthewaysproposed
in this book. Choosing the wrong way could have a disastrous effect on what one
intendstoleaveforfuturegenerations!
Weprovidewhatwebelieveisstrongevidencethatwhatisproposeddoesactu-
ally work for a wide variety of digital objects from many disciplines, through a
numberofacceleratedlifetimescenarios,validatedbymembersoftheappropriate
communities.
Part III provides answers to the questions about how to ensure that resources
devoted to preserve digital objects are not wasted, showing a number of ways
in which effort can be shared. In addition this part provides guidance on how to
evaluate whether a particular repository (perhaps your own) is doing a good job,
and where it might be improved. This part also describes the thinking behind the
workcarriedouttoproducetheISOstandardsonwhichtheinternationalauditand
certificationprocesscanbebased.
Throughout the book we indicate points where experience shows
thereisadangerofmisunderstandingbythesymbol
3 PreservationandCuration
This book is about digital preservation but there is another term which is being
used, namely digital curation. The UK Digital Curation Centre [3] used to define
this in the following way: “Digital curation is maintaining and adding value to a
trustedbodyofdigitalinformationforcurrentandfutureuse;specifically,wemean
the active management and appraisal of data over the life-cycle of scholarly and
scientific materials”. This definition has been changed more recently to “Digital
curationinvolvesmaintaining,preservingandaddingvaluetodigitalresearchdata
throughoutitslifecycle”.Sometimesthephrase“digitalcurationandpreservation”
isalsoused.
Wepreferthetermpreservationinthisbooksincewedonotwishtorestrictour
considerationto“scholarlyandscientificmaterials”nor“researchdata”,becausewe
wishtoensurewecanapplyourtechniquestoallkindsofdigitalobjectsincluding,
forexample,commercialandlegalmaterial.Nordowewishtorestrictourselvesto
onlya“trustedbodyofdigitalinformation”–sinceonemightwishtopreservefal-
sifieddataforexampleasevidenceforlegalproceedings.Moreoveraswewillsee,
our definition of preservation requires that if we are to preserve digitally encoded
information we must ensure it remains understandable and usable. In other words
preservation is the sine qua non of curation. For example it is possible to manage
x Preface
andpublishdigitallyencodedinformationwithoutregardtofutureuse;ontheother
handifonewishestoensurefutureaswellascurrentuse,onemustunderstandthe
requirementsforpreservation.
4 OAISDefinitions
OAIS[4]playsacentralroleinthisbook.Manydefinitions,andsomedescriptive
text,aretakenfromtheupdatedOAIS;theseareshownasbolditalics.
5 Acknowledgements
This book would not have been written without the work carried out by the many
membersoftheCASPAR[2],DCC[3]and PARSE.Insight[1]projects,aswellas
themembersofCCSDS[5] andotherswhohaveworkedondevelopingOAIS[3]
and the standards for certification of digital repositories [6], all of whom must be
thankedfortheirefforts.
A fuller list of contributors may be found in “Contributors” at the end of the
book.
Finallytheeditorandmainauthorofthisbookwouldliketothankhisfamily,in
particularhiswifeKrystinaanddaughterZoe,fortheirsupportandhelpinpreparing
thisbookforpublication.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 What’sSoSpecialAboutDigitalThings? . . . . . . . . . . . . 2
1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 TheReallyFoolproofSolutionforDigitalPreservation . . . . . . . 7
PartI Theory–TheConceptsandTechniquesWhichAre
EssentialforPreservingDigitallyEncodedInformation
3 IntroductiontoOAISConceptsandTerminology . . . . . . . . . . 13
3.1 PreserveWhat,forHowLongandforWhom? . . . . . . . . . 13
3.2 What“Metadata”,HowMuch“Metadata”? . . . . . . . . . . . 16
3.3 Recursion–APervasiveConcept . . . . . . . . . . . . . . . . 26
3.4 DisincentivesAgainstDigitalPreservation . . . . . . . . . . . 28
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4 TypesofDigitalObjects . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1 Simplevs.Composite . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Renderedvs.Non-rendered . . . . . . . . . . . . . . . . . . . 33
4.3 Staticvs.Dynamic . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Activevs.Passive . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5 Multiple-Classifications . . . . . . . . . . . . . . . . . . . . . 39
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 ThreatstoDigitalPreservationandPossibleSolutions . . . . . . . 41
5.1 WhatCanBeReliedonintheLong-Term? . . . . . . . . . . . 43
5.2 WhatOthersThinkAboutMajorThreats
toDigitalPreservation . . . . . . . . . . . . . . . . . . . . . . 44
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6 OAISinMoreDepth . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1 OAISConformance. . . . . . . . . . . . . . . . . . . . . . . . 49
6.2 OAISMandatoryResponsibilities . . . . . . . . . . . . . . . . 50
6.3 OAISInformationModel . . . . . . . . . . . . . . . . . . . . . 53
6.4 OAISFunctionalModel . . . . . . . . . . . . . . . . . . . . . 63
xi