Table Of ContentBertram Ludäscher
Beth Plale (Eds.)
Provenance
8
2
and Annotation of Data
6
8
S
C and Processes
N
L
5th International Provenance
and Annotation Workshop, IPAW 2014
Cologne, Germany, June 9–13, 2014, Revised Selected Papers
123
Lecture Notes in Computer Science 8628
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zürich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7409
ä
Bertram Lud scher Beth Plale (Eds.)
(cid:129)
Provenance
and Annotation of Data
and Processes
5th International Provenance
and Annotation Workshop, IPAW 2014
–
Cologne, Germany, June 9 13, 2014
Revised Selected Papers
123
Editors
Bertram Ludäscher BethPlale
Universityof Illinois Indiana University
Urbana-Champaign, IL Bloomington,IN
USA USA
ISSN 0302-9743 ISSN 1611-3349 (electronic)
Lecture Notesin ComputerScience
ISBN 978-3-319-16461-8 ISBN 978-3-319-16462-5 (eBook)
DOI 10.1007/978-3-319-16462-5
LibraryofCongressControlNumber:2015933500
LNCSSublibrary:SL3–InformationSystemsandApplications,incl.Internet/WebandHCI
SpringerChamHeidelbergNewYorkDordrechtLondon
©SpringerInternationalPublishingSwitzerland2015
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow
knownorhereafterdeveloped.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication
doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant
protectivelawsandregulationsandthereforefreeforgeneraluse.
Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare
believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissionsthatmayhavebeenmade.
Printedonacid-freepaper
SpringerInternationalPublishingAGSwitzerlandispartofSpringerScience+BusinessMedia
(www.springer.com)
Preface
This volume contains the proceedings of the 5th International Provenance and Anno-
tation Workshop (IPAW), held during June 10–11, 2014 at the German Aerospace
Center (DLR) in Cologne, Germany. For the first time, IPAW colocated with the
WorkshopontheTheoryandPracticeofProvenance(TaPP).Togetherthetwoleading
provenance workshops anchored ProvenanceWeek 2014, a full week of provenance-
related activities that included a shared poster session, a panel on reproducibility in
science,andtutorialsontheW3CPROVstandard,onprovenanceanalytics,andtheuses
ofprovenanceincellbiology.Theweekwasroundedoutwithafternoon-longbirds-of-
a-featheractivitiesaroundconstructingaprovenancerecordfromdatawhenprovenance
was not collected in the first place, and benchmarking of provenance systems. This
collection constitutes the peer-reviewed papers of IPAW 2014. These include 14 long
papers which report in-depth the results of research around provenance and four
extended abstracts that discuss tools and services that were presented in the form of a
systemdemonstration.Finally,wehaveincluded20shortabstractsofthejointIPAW/
TaPPpostersession.Thefinalpapers,demos,andposterabstractswereselectedfroma
total of 53 submissions. All full-length research papers and demo papers received a
minimumofthreereviews.
The papers of IPAW 2014 provided a glimpse into state-of-the-art research and
practice around the capture, representation, and use of provenance. Since provenance
often results in graphs, and large ones at that, several of the papers in this collection
proposed abstract graph models and methods with well-defined properties, properties
that can hold even when sanitized for potentially sensitive information. Tools are the
focus of a number of papers in this collection; these are innovative software applica-
tions that solve a particular problem and are evaluated experimentally. They are often
converging on the W3C PROV model for provenance interchange. Some papers dis-
cussed tools that enable provenance capture from software compilers, from web pub-
lications, and from scripts, using existing audit logs, and employing both static and
dynamic instrumentation. New methodologies for provenance aggregation and use
appeared in the collection as well. We see the evaluation of a linked data approach to
provenance publishing, the generation of documentation from provenance, and appli-
cation of provenance to protect attribution in scientific discovery.
Inclosing,wewouldliketothankthemembersoftheProgramCommitteefortheir
thoughtful reviews, Dr. Andreas Schreiber (Local Chair) and Carina Haupt for their
excellentorganizationofIPAWandProvenanceWeek2014atDLR,and—lastbutnot
least—the authors and participants for making IPAW the stimulating and successful
event that it was.
December 2014 Bertram Ludäscher
Beth Plale
Organization
Program Committee
Ilkay Altintas University of California, San Diego, USA
Khalid Belhajjame PSL, Université Paris-Dauphine, LAMSADE,
France
Shawn Bowers Gonzaga University, USA
Adriane Chapman The MITRE Corporation, USA
James Cheney University of Edinburgh, UK
Susan Davidson University of Pennsylvania, USA
Tom De Nies Ghent University - iMinds - Multimedia Lab,
Belgium
Kai Eckert University of Mannheim, Germany
Juliana Freire NYU Polytechnic School of Engineering, USA
James Frew Bren School / UCSB, USA
Daniel Garijo Universidad Politécnica de Madrid, Spain
Yolanda Gil USC/ISI, USA
Paul Groth VU University Amsterdam, The Netherlands
Trung Dong Huynh University of Southampton, UK
H. V. Jagadish University of Michigan, USA
David Koop NYU Polytechnic School of Engineering, USA
Carl Lagoze University of Michigan School of Information,
USA
Timothy Lebo Rensselaer Polytechnic Institute, USA
Qing Liu CSIRO, Australia
Shiyong Lu Wayne State University, USA
Bertram Ludäscher University of California, Davis, USA
Tanu Malik University of Chicago, USA
Marta Mattoso COPPE- Federal Univ. Rio de Janeiro, Brazil
Deborah McGuinness Rensselaer Polytechnic Institute, USA
Simon Miles King’s College London, UK
Paolo Missier Newcastle University, UK
Luc Moreau University of Southampton, UK
Beth Plale Indiana University, USA
Yogesh Simmhan Indian Institute of Science, India
Curt Tilmes NASA GSFC, USA
Jan Van Den Bussche Hasselt University and University of Limburg
Contents
Standardization of Provenance Models, Services, Representations
ProvAbs: Model, Policy, and Tooling for Abstracting PROV Graphs. . . . . . . 3
Paolo Missier, Jeremy Bryans, Carl Gamble, Vasa Curcin,
and Roxana Danger
ProvGen: Generating Synthetic PROV Graphs with Predictable Structure. . . . 16
Hugo Firth and Paolo Missier
Applications of Provenance
Walking into the Future with PROV Pingback: An Application
to OPeNDAP Using Prizms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Timothy Lebo, Patrick West, and Deborah L. McGuinness
Provenance for Online Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Amir Sezavar Keshavarz, Trung Dong Huynh, and Luc Moreau
Regenerating and Quantifying Quality of Benchmarking Data
Using Static and Dynamic Provenance. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Devarshi Ghoshal, Arun Chauhan, and Beth Plale
Provenance Management Architectures and Techniques
noWorkflow: Capturing and Analyzing Provenance of Scripts . . . . . . . . . . . 71
Leonardo Murta, Vanessa Braganholo, Fernando Chirigati,
David Koop, and Juliana Freire
LabelFlow: Exploiting Workflow Provenance to Surface Scientific
Data Provenance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Pinar Alper, Khalid Belhajjame, Carole A. Goble, and Pinar Karagoz
Auditing and Maintaining Provenance in Software Packages. . . . . . . . . . . . . 97
Quan Pham, Tanu Malik, and Ian Foster
Security and Privacy Implications of Provenance
An Analytical Survey of Provenance Sanitization . . . . . . . . . . . . . . . . . . . . 113
James Cheney and Roly Perera
A Provenance-Based Policy Control Framework for Cloud Services . . . . . . . 127
Mufajjul Ali and Luc Moreau
VIII Contents
Applying Provenance to Protect Attribution in Distributed Computational
Scientific Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Luiz M.R. Gadelha Jr. and Marta Mattoso
Provenance Discovery and Data Reproducibility
Looking Inside the Black-Box: Capturing Data Provenance
Using Dynamic Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Manolis Stamatogiannakis, Paul Groth, and Herbert Bos
Generating Scientific Documentation for Computational Experiments
Using Provenance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Adianto Wibisono, Peter Bloem, Gerben K.D. de Vries, Paul Groth,
Adam Belloum, and Marian Bubak
Computing Location-Based Lineage from Workflow Specifications
to Optimize Provenance Queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Saumen Dey, Sven Köhler, Shawn Bowers, and Bertram Ludäscher
System Demonstrations
Interrogating Capabilities of IoT Devices. . . . . . . . . . . . . . . . . . . . . . . . . . 197
Stanislav Beran, Edoardo Pignotti, and Peter Edwards
A Lightweight Provenance Pingback and Query Service
for Web Publications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Tom De Nies, Robert Meusel, Dominique Ritze, Kai Eckert,
Anastasia Dimou, Laurens De Vocht, Ruben Verborgh, Erik Mannens,
and Rik Van de Walle
Provenance-Based Searching and Ranking for Scientific Workflows . . . . . . . 209
Víctor Cuevas-Vicenttín, Bertram Ludäscher, and Paolo Missier
PROV-O-Viz - Understanding the Role of Activities in Provenance . . . . . . . 215
Rinke Hoekstra and Paul Groth
Joint IPAW/TaPP Poster Session
The Aspect-Oriented Architecture of the CAPS Framework for Capturing,
Analyzing and Archiving Provenance Data. . . . . . . . . . . . . . . . . . . . . . . . . 223
Peer C. Brauer, Florian Fittkau, and Wilhelm Hasselbring
Improving Workflow Design Using Abstract Provenance Graphs . . . . . . . . . 226
Tianhong Song, Saumen Dey, Shawn Bowers, and Bertram Ludäscher
Contents IX
Early Discovery of Tomato Foliage Diseases Based on Data Provenance
and Pattern Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Diogo Nunes, Carlos Werly, Gizelle Kupac Vianna,
and Sérgio Manuel Serra da Cruz
Provenance in Open Data Entity-Centric Aggregation . . . . . . . . . . . . . . . . . 232
Fausto Giunghiglia and Moaz Reyad
Enhancing Provenance Representation with Knowledge Based
on NFR Conceptual Modeling: A Softgoal Catalog Approach. . . . . . . . . . . . 235
Sérgio Manuel Serra da Cruz and André Luiz de Castro Leal
Provenance Storage, Querying, and Visualization in PBase. . . . . . . . . . . . . . 239
Víctor Cuevas-Vicenttín, Parisa Kianmajd, Bertram Ludäscher,
Paolo Missier, Fernando Chirigati, Yaxing Wei,
David Koop, and Saumen Dey
Engineering Choices for Open World Provenance. . . . . . . . . . . . . . . . . . . . 242
M. David Allen, Adriane Chapman, and Barbara Blaustein
Towards Supporting Provenance Gathering and Querying in Different
Database Approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Flavio Costa, Vítor Silva, Daniel de Oliveira, Kary A.C.S. Ocaña,
and Marta Mattoso
Provenance for Explaining Taxonomy Alignments. . . . . . . . . . . . . . . . . . . . 258
Mingmin Chen, Shizhuo Yu, Parisa Kianmajd, Nico Franz,
Shawn Bowers, and Bertram Ludäscher
Challenges for Provenance Analytics Over Geospatial Data . . . . . . . . . . . . . 261
Daniel Garijo, Yolanda Gil, and Andreas Harth
Adaptive RDF Query Processing Based on Provenance . . . . . . . . . . . . . . . . 264
Marcin Wylot, Philippe Cudré-Mauroux, and Paul Groth
Using Well-FoundedProvenanceOntologies toQueryMeteorological Data . . . 267
Thiago Silva Barbosa, Ednaldo O. Santos, Gustavo B. Lyra,
and Sérgio Manuel Serra da Cruz
Applying W3C PROV to Express Geospatial Provenance at Feature
and Attribute Level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Joan Masó, Guillem Closa, and Yolanda Gil
ProvStore: A Public Provenance Repository. . . . . . . . . . . . . . . . . . . . . . . . 275
Trung Dong Huynh and Luc Moreau
Description:This book constitutes the revised selected papers of the 5th International Provenance and Annotation Workshop, IPAW 2014, held in Cologne, Germany in June 2014. The 14 long papers, 20 short papers and 4 extended abstracts presented were carefully reviewed and selected from 53 submissions. The papers