Activities and Services

Related Links

University Links

Back to list of 2005/06 seminars

Abstract for Seminar

The importance of understanding the process by which a result was generated in a computation is fundamental to science, engineering or business. For example, without such information, other scientists cannot reproduce, analyse or validate experiments. Likewise, businesses must demonstrate their systems' results were produced in a regulatory-compliant manner. Provenance is therefore important to enable users to trace how a particular result has been arrived at.

Based on the common sense definition of provenance, we propose a new definition of provenance that is suited to the computational model underpinning service-oriented architectures: the provenance of a piece of data is the process that led to the data. Since our aim is to conceive a computer-based representation of provenance that allows us to perform useful reasoning about the origin of results, we examine the nature of such representation, which is articulated around the documentation of execution.

We then examine the architecture of a provenance system, centered around the notion of a provenance store designed to support the provenance lifecycle: during a recording phase some documentation of execution is archived in the provenance store, whereas a reasoning phase operates over the archived documentation. Then, we successively discuss a protocol for recording execution documentation, a query facility to gain access to the contents of the store, and a reasoning system to make inferences. The realisation of such an architecture is particularly challenging in the presence of e-Science experiments since it must be scalable.

The presentation will draw upon our experience in the PASOA (www.pasoa.org) and EU Provenance (www.gridprovenance.org) projects and will rely on explicit use cases derived from e-Science applications in the domain of bioinformatics, high energy physics, organ transplant management and aerospace engineering.