Workshop “Scientific workflow and business workflow standards in e-Science”

Organizers: Adam Belloum and Zhiming Zhao

Science Faculty, University of Amsterdam

Virtual Laboratory for e-Science (VL-e)

Time: 5/Dec/2006

in conjunction with the IEEE Int’l Conf. e-Science 2006

(CFP)

1. Motivation

As in many e-Science projects, workflows play an important role in the VL-e projects. Taking the opportunity of having the e-Science conference 2006 organized in the Amsterdam, we organize a workshop called “Scientific workflow and business workflow standards in e-Science” where the VL-e community can meet the e–Science community and discuss the role of workflow management systems for e-Science in general and in VL-e in particular. We believe the workshop provides an ideal platform for discussing issues on applying scientific workflows including industrial workflow standards in e-Science applications.

2. The program of the workshop:

The workshop consists of two oral sessions and one panel session. Four of the invited speakers, among other of their achievements, have been very active in the design and/or the development of four well known Workflow Management Systems (WMS), currently used in a number of research projects around the world: Pegasus, Kepler, Tirana, and Taverna.

Three of these systems have been recommended to the VL-e community to be used as part of the, what in the VL-e project is known as, short term solution, as it became clear in the talk of Prof. Adriaans member of VL-e directorate board and research program leader. The VL-e end-users cover a number of scientific domains including: Data-intensive, food-informatics, medical, bio-diversity, bio-informatics, tele-science. Actually, it was not possible to find a unique WMS which can handled all the requirements we have collected from the different VL-e users in the first phase of the VL-e project. We have thus recommended three of these systems which should, in principal, allow them to start right away do interesting research work. The more long term view of the workflow group within the VL-e project is that during the lifetime of the VL-e project, we should provide these users with a more elegant, and generic solution which should increase the re-usability and the knowledge transfer across the six different scientific domains.

The discussion about WMS would not be complete if we will not involve speakers representing the industry point of view; this is why we have also invited two talks from the industry. Unfortunately one of our invited speakers was not able to attend the workshop. Only Mr. Konig, a senior technical Staff from IBM Germany, could join and delivered a very interesting talk about Business Process Execution Language (WS-BPEL) 2.0.

http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/WF-Workshop-program.ppt

2.1 Session 1, (chaired by Adam Belloum)

“Meeting the Challenges of Managing Large-Scale Scientific Workflows in Distributed Environments” by Ewa Deelman

Summary—in this talk Ewa Deelman discusses several challenges associated scientific workflow design and management in distributed, heterogeneous environments. Based on a prior work with a number of scientific applications, Ewa Deelman describes the workflow lifecycle and the concept of workflow template from which a number of instances can be created and executed. She also discussed the experiences and the challenges ahead as they pertain to the user experience, planning the workflow execution and managing the execution itself.

The Slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/Deelman-workflow2.ppt

Dr. Ewa Deelman Is a Research Team Leader in the Center for Grid Technologies at the USC Information Sciences Institute. She is also a Research Assistant Professor in the Computer Science Department at USC. . Her main area of research is scientific workflow management in Grids. As part of this work she is leading the design and development of the Pegasus software that maps complex application workflows onto distributed resources. Pegasus is being used in a variety of scientific applications.

“Scientific Workflows: More e-Science Mileage from Cyberinfrastructure” by Bertram Ludäscher

Summary — Bertram Ludäscher presented his view on scientific workflows as the domain scientist’s way to harness cyberinfrastructure for e-Science. He discussed workflows from different angles: the scientific domain view, e engineering View, and computer Scientists view. Bertram Lud äscher presented the Actor –Oriented Modelling used in the Kepler project. He also presented a number of “Scientific Workflow Design: Challenges” and presented a some way of addressing this challenges such as the semantic annotation, and Collection-Oriented Modelling & Design.

The slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/Ludasher-even-more-wf-mileage.ppt

Dr. Bertram Ludaescher is an Associate Professor at the Department of Computer Science and the Genome Center at the University of California, Davis. He is also a fellow of the San Diego Supercomputer Center at UC San Diego Dr. Ludäscher's primary research interests are in scientific data management, in particular scientific data integration, scientific workflow management, and knowledge-based (semantic) extensions. Dr. Ludaescher is actively contributing to several large scale research collaborations dealing with scientific data management, including the Geosciences Network (GEON), the Science Environment for Ecological Knowledge, the Biomedical Informatics Research Network, and the DOE Scientific Data Management Center.

“Triana Generations” by Ian Taylor

Summary— Ian Taylor presented the Triana workflow system within the context of the workflow community at large. He provided a brief background for Triana and discusses the ways in which is has been used in the past for serial and as-well-as distributed tasks. He also presented the Triana distributed architecture and key features, being: its user interface and its ability to work simultaneously in heterogeneous distributed environments.

The slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/Taylor-TrianaGenerations.ppt

Dr. Ian Taylor is the coordinator for the Triana project. His research and implementing artificial-neural-network types for the determination of musical pitch. He is the head of the developer team of the Triana, he supported initial C++ implementation of Triana, later rewriting it in Java. He has also contracts for NRL in Washington DC, working with simulating sensor nets within MANET networks using Java within NS2. Research interests include PSEs, SOAs, distributed simulation environments, Grid and P2P computing.

2.2 Session 2, (chaired by Zhiming Zhao)

“EMBRACE: Bioinformatics data and analysis tool services for e-Science” by Peter Rice

Summary— Piter Rice presented the EMBRACE project, a network of European partners providing services which integrate the major data resources and analysis software tools using web services and emerging grid technologies. Piter Rice described the preferred client for these services the Taverna from the myGrid project. He also discussed “What could possibly go wrong?” when the data resources and analysis software starts being used.

The slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/Rice-EmbraceDec06.ppt

Prof. Peter Rice is investigating & advising on the e-Science & Grid technology requirements of the EMBL-EBI, through application development plus participation in standards development. actively contributing to several large scale research collaborations the MyGrid project, the UK e-science initiative, and the Candy

“Web Services - Business Process Execution Language (WS-BPEL) 2.0” by D. König

Summary— Dieter König gave an overview of the WS-BPEL language and shows how it can be used to compose Web services. He provided highlights of WS-BPEL, including structured activities, correlation, compensation, and fault handling. Finally, the OASIS WS-BPEL Technical Committee work, the current status of the standard, and an outlook on follow-on activities is presented.

The slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/ Koenig-WS-BPEL 2.0.pdf

Dr. Dieter König is a software architect for workflow systems at the IBM Germany Development Laboratory. He joined the laboratory in 1988 and has worked on Resource Measurement Facility for z/OS, MQSeries Workflow, and WebSphere Process Choreographer.

“Workflow design and implementation issues in the VL-e project” by P. Adriaans

Summary— P. Adriaans gave an overview of the structure and the Mission of the VL-e project in boosting e-Science by creating an e-Science environment and carrying out research on methodologies. He also presented some of the current research activity within the VL-e project and turns about workflow and developing e-science services, he presented the AIDA tools for data mining, the idea of the workflow Bus, and a formal approach to describe workflow components in e-science.

The slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/Adriaans-VL-e-science06.ppt

Prof. Pieter Adriaans is professor in machine learning/artificial intelligence at the UvA. He founded Syllogic Systems www.perotsystems.com. He is also advisor of Robosail Systems, a company that manufactures and sells self learning autopilots, senior research advisor for Perot Systems Corporation, and member of the VL-e directorate board. Adriaans is member of the ICGI (International Conference on Grammar Induction) steering committee

2.3 Session 3, Panel discussion: (Moderator: P. Adriaans (PA))

Panelists: E. Deelman (ED), D. König (DK), B. Ludaescher (BL) , P. Rice (PR), I. Taylor (IT)

Participations from audience: Carole Gobel (CG), Jeroen Snel (JS), Silvia Olabarriaga (SO), Marian Bubak (MB) …

The panel discussion started by two short presentations given by Zhiming Zhao and Marian Bubak which aimed at raising a number of challenging topics (including provocative statements) for the Panel discussion. Zhiming described the challenging issues form the VL-e point of view, and Marian described the challenges as seen by the e-Science community, he concatenated the list of challenging issues based on the talks presented in the first day of the e-Science conference.

The slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/Dicussion-issues.ppt

NB: the Following summary is just what we have understood from the discussion, it does not reflect to the word the statement made by the panelists. We do apologize to the panelist and to the audience, if we have misinterpreted some of their statements. We also invite everyone who has participated to the discussion to give us his comment on the following minutes.

Zhiming Zhao:

Questions :

How to move applications to e-science? How to describe an application?

How to choose workflow model?
What workflow language?
How to decompose an application?

How to run an e-Science application?

How to include human in the computing loop?
How to provide provenance support?

How to reuse state of art results from international community?

How to choose a proper workflow system?
How to deal with missing functionality?

How to integrate existing generic e-Science services?

How to choose integration standard?

Marian bubak

Questions:

Are workflows the only paradigms for collaborative environments?
Are workflows same as Problem Solving Environments?
To which extends workflow should behave as universal operating systems? Are we going to repeat what people from OS have been doing for 40 years or so?
Many of workflow systems are born before 1997, should we really take grid functionality into workflow? Should we leave grid functionality out of workflow research ?
How much computer science vs. pure software engineering?

■ Low level paradigms?

■ Exploitation of knowledge?

■ Interactivity?

How much computer science vs. pure software engineer Interoperability of Workflow Systems

■ Finding something which will enable interoperability of all workflow? Are we going to develop PL1? Some superset of all programming Language?

■ Should we find generic workflow which interfaces to:

● domain ontologies

● computing resources

● data

● provenance system?

Is it time for standard libraries of scientific components / Workflow Systems? Are we already at the moment where we need to have standards for applications?
Not clear what we could learn from commercial world ? If we think about standards too early it will be difficult to get out of this restriction later on.

PA: What will be the future of e-science workflow management systems in a couple of years? What will be the top three issues to be addressed?

PR: What will be the future of e-science workflow management systems in a couple of years? We will get things working across domain, we already work on bioinformatics domain, and we managed to make work to some cross domain with some tweaking. What will be the top three issues to be addressed?

BL:

Q from Marian

Script is another alternative, if you wanted to use it yourself
Yes, it does not have to be the case but sometimes it happened
No
No, we want to hide grid from the user
Pure software engineering is part of the CS
What is equivalent to relational algebra and database systems? We perhaps could learn from this analogy

DK: General comment from industrial perspective, in all domain and product areas, we encountered workflows. It is a common occurring theme which continuously grows, from various areas. We try to drive BPEL standards and all standards related. Considered as SQL analogue from database.

ED: Why people don't use PSE, we notice that scientist still use scripts, and workflow systems promise to relieve them from the pain of scripts. We need to actually deliver the promise of reliable, ease to use workflow.

IT: Q5 is answered by 1, 2, 3, 4, the whole field needs to be defined, and we are still discussing with it.

Q1: Scripting is not the only paradigm; portal should be taken into consideration

Where are we going to be in future: Convergence of technologies various systems that focused on different thing but doing the same thing.

CG: Perhaps we are asking the wrong questions. Scientist cares more of workflow, rather than workflow system. They will care more about the workflow, whatever systems they will use, as long as it does what they want. In the future there will be a pool of workflow, we should be expecting that. If we are successful we will have a lot of that.

DK: What we create is library of workflow; user does not care about underlying system.

BL: How can we motivate scientist to share workflow? Because it means giving away their intellectual properties before they managed to write their paper/get Nobel Price. Promise of workflow, show exactly how you perform experiments, sharing perhaps yes, after they get their results published. Maybe need some mechanisms to recognize who discover the workflow/idea first

ED: Sharing of data, can be done in small circle, large collaborators. Workflow is a good way to share results.

PA: In bioinformatics domain when you publish sequence in a journal, sequence should be available in public, it has been tradition since 1980’s. You have also to explain what you did and how you obtained. If we have mechanism to publish workflow such this will be good.

Jeroen: if we view workflow as sequence of web services calls, how do you share to logic of web services? If you share workflow you should also think about sharing the logic and all information behind it.

BL: Notion of nesting might help to solve these problems. Overall the underlying model that we don’t have now, needs to support that. We need to be able to look inside what kind of services. Distinguish between black box and white box components.

ED: World is not that simple, you don't have control on all components that you are you using. You just have to keep as much information as you can. When it is still non service application components it is easier with services it is more complicated.

CG: Example Biomart, there are no information about the input and output. The logic of services is not exposed by the EBI people. How do you persuade service providers, to expose enough business logic, but not too much, only up to the point where you want other people to know about it.

PA: The issue of workflow is independent from the scientific domain that we are studying, it is more important for experimental/empirical science. Mathematician might not be interested in the workflow?

PR: Give counter example on the color proof of workflow?

DK: What is the right granularity when you use BPEL? Deciding what piece you want to publish what you wanted to hide?

PA: How it is done in business?

DK: Also in Business there are many different domain using workflow. Information of the business logic of workflow is exposed, but company secret logic is not exposed

BL: Some workflow will be computationally intensive, data intensive; nevertheless there are similar components throughout different domain. Analogy that databases are used in many different domains.

SO: I am a user, if I hear a workflow; I don't know what workflow is. I have application; I developed with programming language, Could we anyway see the problem that we will not discuss workflow as workflow, but a big virtual computer where you should program with some specific programming language.

BL: We can learn from programming language, there is no reason not to have taverna script, kepler scripts etc. What is underlying computational model within workflow? Does not always means DAG? If you need loop streaming what would you do? You don't have to go back to full programming other wise you will go back to python.

IT: Workflow has been around for long. Not many people are trained to think in workflow concept. It would take time for people to be able to think in terms of workflow.

What are the Main Issues in the field of workflow for e-science in the next following years?

ED: What we do today when we look at workflow as monolithic systems, we could also see it as high level description specific for applications that can be compiled down to execution, and so forth. In terms of standardization we could do it in intermediate area, (in the middle area). In the high level scientist can have more flexibility.

DK: Agree that there must be layer on top of BPEL to be used for Scientist. Not all scientists must learn BPEL.

BL: Workflow design, workflow design, workflow design. We want to enable scientist to get their ideas in executable environment that other people can use and accelerate science.

PR: These workflows have strong workflow flavor. Grid -> e-Science->workflow. It all comes down to working together and sharing ideas.