2^nd Int’l workshop on Scientific workflow and business workflow standards in e-Science (SWBES 07)

Organizers: Adam Belloum and Zhiming Zhao

Science Faculty, University of Amsterdam

Virtual Laboratory for e-Science (VL-e)

Time: 10/Dec/2007

in the context of IEEE Int’l Conf. e-Science 2007, Bangalore, India

http://staff.science.uva.nl/a.s.z.belloum/workshops/e-science2007/cfp-swbes-2007.htm

1. Background

In many e-Science projects workflows are playing an important role not only in developing more and more complex application in various scientific domains but also in improving the sharing and the re-usability of knowledge, software components and computational resources. Taking the opportunity of IEEE International conference on Grid Computing and e-Science in 2006 and 2007, we organized a series of workshop on Scientific Workflow and Business workflow standards in e-Science (SWBES). The workshop aims at discussing challenging issues in developing e-Science workflow management systems in general.

The first edition of the workshop (SWBES06)

The SWBES06 gathered prominent and worldwide known workflow systems namely: Pegasus, Kepler, Triana, and Taverna and discussed the latest achievements in workflow management systems for e-Science. A panel discussion was organized at the end of the workshop; important issues was discussed, including the impact of industrial standards, e.g., BPEL and WS on the development in e-Science workflow management systems, interoperability between scientific workflow systems, Semantics and data provenance, and the interactivity and human in the loop workflow control [Highlights of the discussion].

More information about SWBES 06 can be found at the following

http://staff.science.uva.nl/a.s.z.belloum/workshops/e-science2006/e-Science-Workshop-report.htm

The second edition (SWBES07)

The SWBES07 attracted 10 submissions; after peered review three submissions are selected. In addition, the workshop also has invited papers from UK e-Science and from IBM research. The workshop has three sessions: two paper presentations session and a discussion session. We will summarize them in the next two sections.

2. The program of the SWBES 07:

2.1 Paper presentation 1

The paper presentation session include two invited talks and four paper presentations.

Dr. Gargi B Dasgupta from IBM India gave the first invited talk: “Enabling enterprise grid workflow with BPEL”

Summary — the execution of workflow applications is a reality today in enterprise and scientific domains. The core middleware technologies for grids (e.g. meta-schedulers) contain sophisticated resource matching logic, but lack control flow orchestration capability. Workflow orchestrates, on the other hand, suitably control business logic but is unaware of execution requirements of tasks. Marriage of the scheduling technology with workflow management is thereby essential in the design of middleware for geographically distributed grids spanning organizational domains. However, existing endeavours concentrate only on intra-domain workflow execution and use ad hoc, non-layered, non-standard solutions that reversely affect cross-organizational collaboration. In addition, they lack support for efficient data modelling and handling, especially crucial for performance of data intensive applications in distributed data scenarios. In this talk, I will present some efforts underway at IBM in conjunction with its partners overcoming some challenges in designing standardized middleware for grid job flow systems.

The slides: — [PPT]

Gargi B Dasgupta — Gargi B Dasgupta received her Ph.D. in Computer Science in 2003 from University of Maryland, Baltimore County. She worked as a Senior Software Engineer at Marconi Communications, Virginia and at Hughes Software Systems, New Delhi. Her main work centered on protocol implementations and performance optimizations of the routing and signaling stacks. In 2004, she joined IBM Research India as a Research Staff Member, where she has been involved in the distributed systems management group. Her active interests include distributed systems, networking and algorithms and optimization. She collaborates on the Latin American Grid Project in the US and the World Community Grid effort in India. She is a member of IBM TEC-IN and actively publishes in ACM and IEEE forums.

Dr. David De Roure, Carole Goble & Robert Stevens “Designing the myExperiment Virtual Research Environment for the Social Sharing of Workflows”

Summary: — Many scientific workflow systems have been developed and are serving to benefit science. In this paper we look outside the workflow to consider the use of workflows within scientific practice, and we argue that the tremendous scientific potential of workflows will be achieved through mechanisms for sharing and collaboration – empowering the scientist to spread their experimental protocols and to benefit from the protocols of others. We discuss issues in workflow sharing, propose a set of design principles for collaborative e-Science software, and illustrate these principles in action through the design of the myExperiment Virtual Research Environment for collaboration and sharing of experiments.

The slides— [PPT]

Speaker Bio: — Prof David De Roure leads the e-Science activities in the School of Electronics and Computer Science, University of Southampton. Closely involved with the UK e-Science programme since its inception, he has worked with many disciplines from science and social science to arts and humanities. He is Chair of the Open Middleware Infrastructure Institute UK. An advocate for Web in e-Science, he leads the Semantic Grid Research Group in the Open Grid Forum where he is also an e-Science Area Director, and has led two workshops this year on Web 2.0 and the Grid. He is a member of the Scientific Council of the Web Science Research Initiative.

“Storing and Querying Scientific Workflow Provenance Metadata Using an RDBMS” by Artem Chebotko, Xubo Fei, Cui Lin, Shiyong Lu, & Farshad Fotouhi

Summary: — Provenance management has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. This paper proposes an approach to provenance management that seamlessly integrates the interoperability, extensibility, and reasoning advantages of Semantic Web technologies with the storage and querying power of an RDBMS. Specifically, we propose: i) two schema mapping algorithms to map an arbitrary OWL provenance ontology to a relational database schema that is optimized for common provenance queries; ii) two efficient data mapping algorithms to map provenance RDF metadata to relational data according to the generated relational database schema, and iii) a schema-independent SPARQL-to-SQL translation algorithm that is optimized on-the-fly by using the type information of an instance available from the input provenance ontology and the statistics of the sizes of the tables in the database. Experimental results are presented to show that our algorithms are efficient and scalable.

2.2 Paper presentation 2

“Formal Modeling and Analysis of Scientific Workflows Using Hierarchical State Machines”

by Ping yang, Zijiang Yang, & Shiyong Lu

Summary — recently, scientific workflows have gained tremendous momentum due to their critical role in e-Science. Although several scientific workflow management systems have been developed over the past few years, there is alack of support for ensuring reliability of scientific workflows and for controlling the release and propagation of information. In this paper, we propose to model a scientific workflow using a hierarchical state machine and present techniques for verifying and controlling information propagation in scientific workflow environments based on hierarchical state machines.

The slides: — [PPT].

“Production Rule Based Dynamic Flexible Workflow” by Lican Huang

Summary — Scientific workflow often requires dynamic selection of workflow routines, Web Services or workflow engines. Multiple copies of a Web service or multiple workflow engines with different performance are chosen at run time to optimise the workflow. However, simple performance formula for selecting Web services or workflow engines is difficult to find. In some cases, the Services with the same functions but with different algorithms (for example, data clustering services implemented by using the algorithms of Neural Network or SVM) may be only chosen at the just pre-execution point according to the intermediate results of workflow execution. We here use production rule strategy to solve the above problem. In this paper, we present a framework for production rule based dynamic workflow, and give out a primary implementation of this framework

The slides: — [PPT].

2.3 Discussion

“Summary presentation & discussion Zhao Zhiming” by Zhiming Zhao

Summary — Zhiming Zhao presented some observations collected from the past organized workshops on workflow (WES06, WES07, WEBS,). It seems that interest is shifting slowing towards the semantics and workflow interoperability. While the first workshops showed large number of submissions on workflow architecture and modelling, more and more submissions are addressing semantics and interoperability.

“Discussion” Zhiming Zhao & Adam Belloum

Statements from audience

Generic statements

- Web 2.0 promote collaboration, sharing, and reuse, can be used to sharing and reusing workflow

Information for Application developers trying to use scientific workflow

- A web site for workflow patterns: www.workflowpattens.com

Statements from Application Developers

- There is a major gap between application users and developers and workflow world: a lot of systems are out and it is difficult for the end-user to choose the appropriate workflow

- Users do not care a lot about the architecture or the system they are using as much as they care about the usability

Statements from workflow & middleware Developers

- Developing a generic workflow system that glue horizontally (6 application domains), might it is too ambitious! Basic principles are the same but the process on the top varies a lot.

- Service/Grid Service and application to their application e-science framework: Not completely mature to the science

- Maturity of the Grid is not up to the point to make usable for the workflow system. `

- Issues discussed here have been already discussed in the industrial context for more than a decade. It is likely that we cannot be executed in e-science because science workflows are ad-hoc and not well defined. Separately data from program on the design data.

- Interoperability at resource can be solved by SOA, but it can not play a role Interoperability at high level,

- To use workflow technology need to understand the difference between business workflow and scientific workflows. In the business workflow the final user is not an expert, he just use the workflow as it is, why in the scientific world the end-user is an application domain expert how can modify, extend, or remove part of the workflow. How can we satisfy the user can be IBM has an old programming model known as a “Two level programming” or “programming at large”

- User level of workflow abstraction should be at component level, providing some library of components (generic components) that can be entire workflow as in the MyExperiment talk or atomic components as it is the case of many workflow systems (Triana, Kepler, …).

- The success of workflow Working with people in different discipline, make them understand the workflow concepts to allow them to think in terms of workflow and thus they will take the workflow certain level of understanding of workflow concepts

- Taking by the hand could be a good approach to teach workflow end-users to understand and develop workflow which can take advantage of the work

- Paradoxical statement: automate the work to improve the performance, but sometimes when you automate hamper the communication between participating components.

- Sharing of workflow? Why we are interested in workflow sharing? Reusability and social website

Technical statements:

- Support for interactivity:

o Tivoli Workload Scheduler (TWS) from IBM has some support for interactivity but not rich or expressive enough to be used for scientific workflow

o Taverna2 will be released very soon; it will have more features allowing more monitoring, and interactivity

- Scalability:

o BPEL can scale quite well, some experiments have shown that workflow with 1000 tasks can be managed without a big problem. Scaling is not a problem for BPEL, it is more the Ad-hoc definition and design of scientific workflow that cannot be handled correctly by BPEL. Current work on BPEL is try to tackle information

- Others

o Commodity toolkit like the COG (Globus project) maybe very useful to abstract workflow

o Quick comment on scripting: Mashups are workflows? (J. fox). Mashups are not well defined while workflows are defined. Question: if you define mashups does it make them workflow? Mashups can be the first step before the workflow matures!

o Provence: a process with the product of the process

What did we learn from this workshop?

- Despite the effort of organizing a large workshops and conferences on workflow technology, still the end users are confused and do not know how to take fully advantage of this technology. However it was clear that a number of potential workflow users came to learn how workflow can help them to solve the problem.

- We also learn that it is likely that interoperability is a must and cannot be avoid, even if everybody agree it is challenging and it requires some standardisation which not there yet.

This workshop was not the only place in this conference where workflows have been addressed

- On the Tuesday in sessions 1B was also dedicated to workflow and their usage on Grid and P2P. As well as in the keynote given by both Thomas Fahringer and David de Roure from which we can learn a lot about

o Thomas Fahringer: simplicity comes has a direct impact on the on the performance, maybe a solution is to have different a very simple interface with the possibility to optionally go into some level of details to improve the performance when needed.

o David de Roure: things to remember from his talk about workflow “make e-science easier, and Web 2 makes workflow easier”

3. Summary

The SWBES07 continues the discussion we had in SWBES06: workflow system usability, interoperability, and collaboration between different developers. More application users joined the discussion, it is clear that the gap between what domain scientists need and the workflow system functionality is still big. Potential application workflow developers are not sufficiently support or guided to enter the realm of workflows simply because there are too many choices and it is not easy for them to make a choice. It is indeed an important decision for the end-users because it will effect the daily working environment. It clear that workflow developers have to do some PR work and as it was suggested in the discussion maybe take the application developers by the hand to do their first steps in this new world. The workshop highlighted several research lines which are along the direction to bridge this gap:

1. Collaborative working environment .e.g., based on Web 2, will play an increasingly important role in sharing knowledge between scientists. Via a social environment of specific domains, scientists can more efficiently use resources such as workflows, workflow processes, and data in they specific research.

2. The usability of workflow systems is a key issue for scientists. How to bring domain scientists, application developers and workflow developers work together and let them understand each other will be a challenging issue.

3. From the invited talk, we can see industrial standardised workflow language, e.g., BPEL, has been used in scientific workflow not only for coupling services but also for managing massive computing tasks.

2nd Int’l workshop on Scientific workflow and business workflow standards in e-Science (SWBES 07)