Organizers: Adam Belloum and
Zhiming Zhao
Science Faculty,
Virtual Laboratory for e-Science
(VL-e)
Time:
in conjunction with the IEEE Int’l Conf. e-Science 2006
As in many e-Science
projects, workflows play an important role in the VL-e projects. Taking the opportunity of having the
e-Science conference 2006 organized in the
The workshop consists
of two oral sessions and one panel session. Four of the invited speakers, among
other of their achievements, have been very active in the design and/or the
development of four well known Workflow Management Systems (WMS), currently
used in a number of research projects around the world: Pegasus, Kepler, Tirana, and Taverna.
Three of these systems
have been recommended to the VL-e community to be used as part of the, what in
the VL-e project is known as, short term solution, as it became clear in
the talk of Prof. Adriaans member of VL-e directorate board and research
program leader. The VL-e end-users cover a number of scientific domains
including: Data-intensive, food-informatics, medical, bio-diversity,
bio-informatics, tele-science. Actually, it was not possible to find a
unique WMS which can handled all the requirements we have collected from the
different VL-e users in the first phase of the VL-e project. We have thus
recommended three of these systems which should, in principal, allow them to
start right away do interesting research work. The more long term view of the
workflow group within the VL-e project is that during the lifetime of the VL-e
project, we should provide these users with a more elegant, and generic
solution which should increase the re-usability and the knowledge transfer
across the six different scientific domains.
The discussion about
WMS would not be complete if we will not involve speakers representing the
industry point of view; this is why we have also invited two talks from the
industry. Unfortunately one of our invited speakers was not able to attend the
workshop. Only Mr. Konig, a senior technical Staff
from IBM Germany, could join and delivered a very interesting talk about
Business Process Execution Language (WS-BPEL) 2.0.
http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/WF-Workshop-program.ppt
“Meeting the Challenges of Managing Large-Scale
Scientific Workflows in Distributed Environments” by Ewa Deelman
Summary—in this talk Ewa Deelman discusses several challenges
associated scientific workflow design and management in distributed,
heterogeneous environments. Based on a prior work with a number of scientific
applications, Ewa Deelman describes the workflow lifecycle and the concept of
workflow template from which a number of instances can be created and executed.
She also discussed the experiences and the challenges ahead as they pertain to
the user experience, planning the workflow execution and managing the execution
itself.
The Slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/Deelman-workflow2.ppt
Dr. Ewa Deelman Is a Research Team Leader in the Center for Grid Technologies at the USC Information
Sciences Institute. She is also a Research Assistant Professor in the Computer
Science Department at USC. . Her main area of research is scientific
workflow management in Grids. As part of this work she is leading the design
and development of the Pegasus software that maps complex application workflows
onto distributed resources. Pegasus is being used in a variety of scientific
applications.
Summary — Bertram Ludäscher presented his
view on scientific workflows as the domain scientist’s way to harness cyberinfrastructure for e-Science. He discussed workflows
from different angles: the scientific domain view, e engineering View, and
computer Scientists view. Bertram Ludäscher presented the Actor
–Oriented Modelling used in the Kepler project. He
also presented a number of “Scientific
Workflow Design: Challenges” and presented a some way of addressing this challenges
such as the semantic annotation, and Collection-Oriented Modelling &
Design.
The slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/Ludasher-even-more-wf-mileage.ppt
Dr. Bertram Ludaescher is an Associate Professor
at the Department of Computer Science and the Genome Center at the University of California, Davis. He
is also a fellow of the
Summary— Ian Taylor
presented the Triana workflow system within the context of the workflow community at large. He
provided a brief background for Triana and discusses the ways in which
is has been used in the past for serial and as-well-as
distributed tasks. He also presented the Triana
distributed architecture and key features,
being: its user interface and its ability to
work simultaneously in heterogeneous distributed
environments.
The slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/Taylor-TrianaGenerations.ppt
Dr. Ian Taylor is the coordinator for the
Triana project. His research and implementing
artificial-neural-network types for the determination of musical pitch. He is
the head of the developer team of the Triana, he supported initial C++
implementation of Triana, later rewriting it in
Java. He has also contracts for NRL in
Summary— Piter Rice presented the EMBRACE project, a
network of European partners providing services which integrate the major data
resources and analysis software tools using web services and emerging grid
technologies. Piter Rice described the
preferred client for these services the Taverna from
the myGrid project. He also discussed “What could
possibly go wrong?” when the data resources and analysis software starts being
used.
The slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/Rice-EmbraceDec06.ppt
Prof. Peter Rice is investigating & advising
on the e-Science & Grid technology requirements of the EMBL-EBI, through
application development plus participation in standards development. actively
contributing to several large scale research collaborations the MyGrid project, the
Summary— Dieter
König gave an
overview of the WS-BPEL language and shows how it can be used to compose Web
services. He provided highlights of WS-BPEL, including structured activities,
correlation, compensation, and fault handling. Finally, the OASIS WS-BPEL
Technical Committee work, the current status of the standard, and an outlook on
follow-on activities is presented.
The slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/ Koenig-WS-BPEL
2.0.pdf
Dr. Dieter König is a software architect for workflow systems
at the IBM Germany Development Laboratory. He joined the laboratory in 1988 and
has worked on Resource Measurement Facility for z/OS, MQSeries Workflow, and WebSphere Process Choreographer.
Summary— P. Adriaans gave an overview of the
structure and the
The slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/Adriaans-VL-e-science06.ppt
Prof. Pieter Adriaans is professor in machine learning/artificial
intelligence at the UvA. He founded Syllogic
Systems www.perotsystems.com. He is also advisor of Robosail Systems,
a company that manufactures and sells self learning autopilots, senior research
advisor for Perot Systems Corporation, and member of the VL-e directorate
board. Adriaans is member of the ICGI (International Conference on Grammar
Induction) steering committee
Panelists: E. Deelman (ED),
D. König (DK),
B. Ludaescher (BL) , P. Rice
(PR), I. Taylor (IT)
Participations
from audience: Carole Gobel (CG), Jeroen Snel (JS),
Silvia Olabarriaga (SO), Marian
Bubak (MB) …
The panel discussion
started by two short presentations given by Zhiming Zhao and Marian Bubak which
aimed at raising a number of challenging
topics (including provocative statements) for the Panel discussion. Zhiming
described the challenging issues form the VL-e point of view, and Marian
described the challenges as seen by the e-Science community, he concatenated
the list of challenging issues based on the talks presented in the first day of
the e-Science conference.
The slides: http://www.fnwi.uva.nl/a.s.z.belloum/workshop/presentations/Dicussion-issues.ppt
NB: the Following summary is just what we have
understood from the discussion, it does not reflect to the word the statement
made by the panelists. We do apologize to the panelist and to the audience, if we
have misinterpreted some of their statements. We also invite everyone who has
participated to the discussion to give us his comment on the following minutes.
Zhiming Zhao:
Questions :
Marian bubak
Questions:
■
Low level paradigms?
■
Exploitation of knowledge?
■
Interactivity?
■
Finding something which will enable interoperability of all workflow?
Are we going to develop PL1? Some superset of all programming Language?
■
Should we find generic workflow which interfaces to:
● domain ontologies
● computing resources
● data
● provenance system?
PA: What will be the future of e-science workflow management
systems in a couple of years? What will be the top three issues to be
addressed?
PR: What will be the future of e-science workflow
management systems in a couple of years? We will get things working across
domain, we already work on bioinformatics domain, and we managed to make work
to some cross domain with some tweaking. What will be the top three issues to
be addressed?
BL:
Q from Marian
DK: General comment from industrial perspective, in
all domain and product areas, we encountered workflows. It is a common
occurring theme which continuously grows, from various areas. We try to drive
BPEL standards and all standards related. Considered as SQL
analogue from database.
ED: Why people don't use PSE, we notice that
scientist still use scripts, and workflow systems promise to relieve them from
the pain of scripts. We need to actually deliver the promise of reliable, ease
to use workflow.
IT: Q5 is answered by 1, 2, 3, 4, the whole field
needs to be defined, and we are still discussing with it.
Q1: Scripting is not the only paradigm;
portal should be taken into consideration
Where are we going to
be in future: Convergence of technologies various systems that focused on different
thing but doing the same thing.
CG: Perhaps we are asking the wrong questions.
Scientist cares more of workflow, rather than workflow system. They will care
more about the workflow, whatever systems they will use, as long as it does
what they want. In the future there will be a pool of workflow, we should be
expecting that. If we are successful we will have a lot of that.
DK: What we create is library of workflow; user
does not care about underlying system.
BL: How can we motivate scientist to share
workflow? Because it means giving away their intellectual properties before
they managed to write their paper/get Nobel Price. Promise of workflow, show
exactly how you perform experiments, sharing perhaps yes, after they get their
results published. Maybe need some mechanisms to recognize who discover the
workflow/idea first
ED: Sharing of data, can be done in small circle,
large collaborators. Workflow is a good way to share results.
PA: In bioinformatics domain when you publish
sequence in a journal, sequence should be available in public, it has been
tradition since 1980’s. You have also to explain what you did and how you
obtained. If we have mechanism to publish workflow such this will be good.
Jeroen: if we view workflow as sequence of web services
calls, how do you share to logic of web services? If you share workflow you
should also think about sharing the logic and all information behind it.
BL: Notion of nesting might help to solve these
problems. Overall the underlying model that we don’t have now, needs to support
that. We need to be able to look inside what kind of services. Distinguish
between black box and white box components.
ED: World is not that simple, you don't have
control on all components that you are you using. You just have to keep as much
information as you can. When it is still non service application components it
is easier with services it is more complicated.
CG: Example Biomart,
there are no information about the input and output. The logic of services is not exposed by the
EBI people. How do you persuade service providers, to expose enough business
logic, but not too much, only up to the point where you want other people to
know about it.
PA: The issue of workflow is independent from the
scientific domain that we are studying, it is more important for
experimental/empirical science. Mathematician might not be interested in the
workflow?
PR: Give counter example on the color proof of
workflow?
DK: What is the right granularity when you use
BPEL? Deciding what piece you want to publish what you wanted to hide?
PA: How it is
done in business?
DK: Also in Business there are many different
domain using workflow. Information of the business logic of workflow is
exposed, but company secret logic is not exposed
BL: Some workflow will be computationally
intensive, data intensive; nevertheless there are similar components throughout
different domain. Analogy that databases are used in many different domains.
SO: I am a user, if I hear a workflow; I don't
know what workflow is. I have application; I developed with programming
language, Could we anyway see the problem that we will not discuss workflow as
workflow, but a big virtual computer where you should program with some
specific programming language.
BL: We can learn from programming language, there is no reason not to
have taverna script, kepler
scripts etc. What is underlying computational model within workflow? Does not
always means DAG? If you need loop streaming what would you do? You don't have
to go back to full programming other wise you will go back to python.
IT: Workflow has been around for long. Not many
people are trained to think in workflow concept. It would take time for people
to be able to think in terms of workflow.
What are the
Main Issues in the field of workflow for e-science in the next following years?
ED: What we do today when we look at workflow as
monolithic systems, we could also see it as high level description specific for
applications that can be compiled down to execution, and so forth. In terms of
standardization we could do it in intermediate area, (in the middle area). In
the high level scientist can have more flexibility.
DK: Agree that there must be layer on top of BPEL
to be used for Scientist. Not all scientists must learn BPEL.
BL: Workflow design, workflow design, workflow
design. We want to enable scientist to get their ideas in executable
environment that other people can use and accelerate science.
PR: These workflows have strong workflow flavor.
Grid -> e-Science->workflow. It all comes down to working together and
sharing ideas.