3^rd International Workshop on

Scientific Workflows and Business Workflow Standards in e-Science (SWBES)

in conjunction with

IEEE Int’l Conf. on e-Science 2008

( e-Science 2008 )

Dec 10, 2008 Indianapolis USA

(Submission deadline extended to August 17, 2008)

Paper upload facilitate

(2006 | 2007)

Aims and scope

Nowadays, scientific experiments often involve cooperation between large scale computing and data resources. Workflow management systems are emerging as a key element to help scientists to prototype and execute experiments processes and to accelerate the scientific discoveries from the experiment. Concerted research is carried out in several projects along the complete e-Science technology chain, ranging from applications to networking, focusing on new methodologies and re-usable components. A number of research groups worldwide are developing more and more advanced workflow features to support scientists to develop complex computing and data intensive application workflows using geographically distributed data and computing resources. Despite many successes, the gap between the workflow application developers and workflow system developers is still very big and it is still difficult for lots of potential application developers to fully utilize the features offered by the workflow systems. The workshop of SWBES08 focuses on practical aspects of utilising workflow techniques to fill the gap between the e-Science applications on one hand and the middleware (Grid) and the low level infrastructure on the other hand. The workshop aims to provide a forum for researchers and developers in the field of e-Science to exchange the latest experience and research ideas on scientific workflow management and e-Science. Live demos of workflow systems and workflow application highly recommended.

Topics

Authors are invited to submit original manuscripts that demonstrate current research in all areas of scientific workflow management in e-Science. The workshop solicits novel papers on using business workflow standards to tackle scientific workflow issues, including but not limited to:

Workflow infrastructure and e-Science middleware
Workflow API and graphical user interface
Workflow modelling techniques
Workflow specification language
Workflow execution engine
Dynamic workflow control
Workflow verification and validation
Workflow system performance analysis
Support tools for managing workflows
AI techniques in workflow management, e.g., planning, runtime control and user support;
Security control in managing workflow
Real-world applications of scientific workflow
Different levels of interoperability among workflow systems;
Automatic composition of scientific workflow;
Scientific Workflow Management Systems in e-science framework
Different domains of e-Science Applications
Workflow system performance analysis
Knowledge infrastructure in workflow management;
Applying workflow techniques for e-science: dynamic control, verification and validation, planning, runtime control and user support;
Automatic composition of scientific workflow;
Knowledge infrastructure for e-Science workflow management

Paper submission and publication

Authors should submit electronically a full (6-page) paper to the workshop upload facilitate. The papers will be carefully evaluated based on originality, significance, technical soundness, and clarity of expression. Accepted papers should be presented at the workshop. All accepted papers will be published by the IEEE Computer Society Press, USA and will be made available online through the IEEE Digital Library.

Important Dates

Papers Due: ~~July 31,~~ August 17, 2008
Notification of Acceptance: August 30, 2008
Camera Ready Papers Due: September 10, 2008

Programme committee

Pieter Adriaans (University of Amsterdam, the Netherlands)
Ilkay Altintas (University of California, USA)
Henri Bal (Vrije Universiteit, The Netherlands)
Marian Bubak (AGH University of Science and Technology, Krakow, Poland)
Artem Chebotko (University of Texas-Pan American)
Gargi B Dasgupta (IBM India Research Lab)
Cees de Laat (University of Amsterdam, the Netherlands)
David De Roure (University of Southampton, UK)
Ewa Deelman (Department of Computer Science University of South California)
Minglu Li (Shanghai Jiaotong University, China),
Bob Hertzberger (University of Amsterdam, the Netherlands)
Bertram Ludäscher (University of California, Davis, USA)
Shiyong Lu (Wayne State University, USA)
Syed Naqvi (CETIC, Belgium)
Ian J. Taylor (Department of Computer Science Cardiff University, UK)
Liqiang Wang (University of Wyoming, USA)

Program

Session 1- (10:00-12:00): Chair: Adam Belloum

“A Tale of Two Workflows” (45min) Roger Barga

Abstract: Scientific workflows have become an archetype to model and run in silico experiments by scientists. These workflows primarily engage in computation and data transformation tasks to perform scientific analysis. There is, however, a whole class of workflows that are used to manage the scientific data when they arrive from external sensors and are prepared for becoming science ready and available for use. While not directly part of the scientific analysis, these workflows operating behind the scene on behalf of the “data valets” play an important role in end-to-end management of scientific data products. They share several traits with traditional scientific workflows: both are data intensive and use web resources. However, they also differ in significant respects, for example, in the degree of reliability required and the type of provenance collected. In this talk I will compare and contrast these two classes of workflows – Science Application workflows and Data Preparation workflows – and use these to drive observations, along with shared and unique requirements from workflow systems for eScience in the Cloud.

“Resource Provisioning Options for Large-Scale Scientific Workflows” (25min) Gideon Juve, Ewa Deelman

Abstract: Scientists in many fields are developing large-scale workflow applications consisting of hundreds of thousands of tasks and requiring thousands of hours of aggregate computation time. Acquiring the computational resources to execute these workflows poses many challenges for application developers. Although the grid provides ready access to large pools of computational resources, the traditional approach to accessing these resources suffers from many overheads that lead to poor performance when used for workflow execution. We describe how resource provisioning techniques such as advance reservations, multi-level scheduling, and cloud computing can be used to reduce scheduling overheads and improve application performance. We explain the advantages and disadvantages of these techniques in terms of cost, performance and usability.

“Build Grid Enabled Scientific Workflows using gRAVI and Taverna” (25min) Kyle Chard, Cem Onyuksel, Wei Tan, Dinanath Sulakhe, Ravi Madduri

Abstract: Scientific communities are increasingly exposing information and tools as online services in an effort to abstract complex scientific processes and large data sets. Clients are able to access services without knowledge of their internal workings simplifying the process of replicating scientific research. Taking a service-oriented approach to science (SOS) facilitates reuse, extension, and scalability of components, whist also making information and tools available to a wider audience. Scientific workflows play a key role realizing SOS by orchestrating services into a well formed logical pipeline to fulfill the requirements of complex scientific experiments. The task of developing such service-oriented infrastructures is not trivial as developers must create and deploy Web services and then coordinate multiple services into complex workflows. This paper presents an end-to-end approach for developing SOS-based workflows with the aim of simplifying the development, deployment, and execution of complex workflows. In particular, we use gRAVI to wrap applications as WSRF Web services and Taverna to compose and execute workflows. The process is validated through the creation of a real world bioinformatics workflow involving multiple services and complex execution paths.

“Kairos: An Architecture for Securing Authorship & Temporal Info. Of Provenance Data in Grid-Enabled Workflow Mgmt. Systems“(25min) Luiz Gadelha, Marta Mattoso

Abstract: Secure provenance techniques are essential in generating trustworthy provenance records, where one is interested in protecting their integrity, confidentiality, and availability. In this work, we suggest architecture to provide protection of authorship and temporal information in gridenabled provenance systems. It can be used in the resolution of conflicting intellectual property claims, and in the reliable chronological reconstitution of scientific experiments. We observe that some techniques from public key infrastructures can be readily applied for this purpose. We discuss the issues involved in the implementation of such architecture and describe some experiments realized with the proposed techniques.

Session 2- (13:00-15:00) Chair: Zhiming Zhao

“Where Experimental Work Flows” (45min) David de Roure

Abstract: The myExperiment social web site and virtual research environment currently supports a community of some 1200 registered users, many sharing the in silico scientific workflows of Taverna. The last year has seen significant growth in both the user and developer communities, with new interfaces being developed over myExperiment's RESTful API including Facebook and Android. Meanwhile myExperiment itself has been extended to support other workflow systems, new forms of shared research object including experimental plans and scripts, and to integrate with service monitoring and cataloguing capabilities. This talk will report on experiences, lessons learnt and future directions as myExperiment evolves into a platform for the e-laboratory and open science.

“Lattice QCD Workflows: A Case Study” (25min) Luciano Piccoli

Abstract: This paper discusses the application of existing workflow management systems to a real world science application (LQCD). Typical workflows and execution environment used in production are described. Requirements for the LQCD production system are discussed. The workflow management systems Askalon and Swift were tested by implementing the LQCD workflows and evaluated against the requirements. We report our findings and future work.

“StrainInfo.net web services: Enabling microbiologic workflows such as phylogenetic tree building & biomarker comparison” (25min) Bert Versylppe, Bram Slabbinck, Wim De Smet, Paul De Vos, Bernard De Baets, Peter Dawyndt

Abstract: In this paper, we present novel web services o
ered by the StrainInfo.net bioportal. This portal integrates information in the domain of microbiology and offers a uniform web interface to the different data providers. By providing web services, the integration results of StrainInfo.net become available for automated processing. Several classes of web services are implemented and some examples are discussed in more detail. Combined with third-party services, the StrainInfo.net services can be integrated into workflows. We describe two workflows: one basic work ow for the construction of a phylogenetic tree based on 16S rRNA gene sequences retrieved from the species of a given genus and a more advanced workflow to collect data of several biomarkers, to calculate the corresponding distance matrices, and to visualize the intra- and inter-species variation among the different biomarkers using the TaxonGap tool. Hereby, the tedious and manual work of collecting and analyzing data, and of visualizing the analysis results has become fully automated.

“MRGIS: A MapReduce-Enabled High Performance Workflow System for GIS” (20min) Liqiang Wang, Qichang Chen, Zongbo Shang

Abstract: The growth of data used by data-intensive computations, e.g. Geographical Information Systems (GIS), has far outpaced the growth of the power of a single processor. The increasing demand of data-intensive applications calls for distributed computing. In this paper, we propose a high performance workflow system MRGIS, a parallel and distributed computing platform based on MapReduce clusters, to execute GIS applications efficiently. MRGIS consists of

a design interface, a task scheduler, and a runtime support system. The design interface has two options: a GUI-based workflow designer and an API-based library for programming in Python. Given a GIS workflow, the scheduler analyzes data dependencies among tasks, then dispatches them to MapReduce clusters based on the current status of the system. Our experiment demonstrates that MRGIS can significantly improve the performance of GIS workflow execution.

Session 3 (15:20-17:00) Chair: Carole Goble

“A High-Level Distributed Execution Framework for Scientific Workflows” (25min) Jianwu Wang, Ilkay Altintas, Chad Berkley, Lucas Gilbert, Matthew Jones

Abstract: Domain scientists synthesize different data and computing resources to solve their scientific problems. Making use of distributed execution within scientific workflows is a growing and promising way to achieve better execution performance and efficiency. This paper presents a high-level distributed execution framework, which is designed based on the distributed execution requirements identified within the Kepler community. It also discusses mechanisms to make the presented distributed execution framework easy-to-use, comprehensive, adaptable, extensible and efficient..

“Capturing Workflow Event Data for Monitoring, Performance Analysis & Management of Scientific Workflows” (25min) Matthew Valerio, Satya Sahoo, Roger Barga, Jared Jackson

Abstract: To effectively support real-time monitoring and performance analysis of scientific workflow execution, varying levels of event data must be captured and made available to interested parties. This paper discusses the creation of an ontology-aware workflow monitoring system for use in the Trident system which utilizes a distributed publish/subscribe event model. The implementation of the publish/subscribe system is discussed and performance results are presented..

“On the Use of Cloud Computing for Scientific Workflows” (20min) Christina Hoffa, Gaurang Mehta, Ewa Deelman, Timothy Freeman, Kate Keahey, Bruce berriman, John Good,

Abstract: This paper explores the use of cloud computing for scientific workflows, focusing on a widely used astronomy application-Montage. The approach is to evaluate from the point of view of a scientific workflow the tradeoffs between running in a local environment, if such is available, and running in a virtual environment via remote, wide-area network resource access. Our results show that for Montage, a workflow with short job runtimes, the virtual environment can provide good compute time performance but it can suffer from resource scheduling delays and wide-area communications.

Discussion (30min) (Chair Carole Goble)

Organisers

Dr. Adam Belloum
email: adam@science.uva.nl

www.staff.science.uva.nl/~adam

Informatics Institute, University of Amsterdam
1098SJ, Amsterdam, the Netherlands

Prof. Carole Goble

email: carole.goble@manchester.ac.uk

http://www.cs.man.ac.uk/~carole/

School of Computer Science in the University of Manchester, UK

Tel: +44 161 275 6195
Fax: +44 161 275 6236

Dr. Zhiming Zhao
email: zhiming@science.uva.nl

Tel: +31 20 5257599

Fax: +31 20 5257490

www: staff.science.uva.nl/~zhiming