Organizers: Adam Belloum and Zhiming Zhao
Science Faculty,
Virtual Laboratory for e-Science (VL-e)
Time: 10/Dec/2007
in the context
of IEEE Int’l Conf. e-Science 2007,
http://staff.science.uva.nl/a.s.z.belloum/workshops/e-science2007/cfp-swbes-2007.htm
In many e-Science projects workflows are playing an important role not only in developing more and more complex application in various scientific domains but also in improving the sharing and the re-usability of knowledge, software components and computational resources. Taking the opportunity of IEEE International conference on Grid Computing and e-Science in 2006 and 2007, we organized a series of workshop on Scientific Workflow and Business workflow standards in e-Science (SWBES). The workshop aims at discussing challenging issues in developing e-Science workflow management systems in general.
The SWBES06 gathered prominent and worldwide known workflow systems namely: Pegasus, Kepler, Triana, and Taverna and discussed the latest achievements in workflow management systems for e-Science. A panel discussion was organized at the end of the workshop; important issues was discussed, including the impact of industrial standards, e.g., BPEL and WS on the development in e-Science workflow management systems, interoperability between scientific workflow systems, Semantics and data provenance, and the interactivity and human in the loop workflow control [Highlights of the discussion].
More information about SWBES 06 can be found at the following
http://staff.science.uva.nl/a.s.z.belloum/workshops/e-science2006/e-Science-Workshop-report.htm
The SWBES07 attracted 10
submissions; after peered review three submissions are selected. In addition,
the workshop also has invited papers from
The paper presentation session include two invited talks and four paper presentations.
Dr. Gargi B Dasgupta from IBM India gave the first invited talk: “Enabling enterprise grid workflow with BPEL”
Summary — the execution of workflow
applications is a reality today in enterprise and scientific domains. The core
middleware technologies for grids (e.g. meta-schedulers) contain sophisticated
resource matching logic, but lack control flow orchestration capability.
Workflow orchestrates, on the other hand, suitably control business logic but
is unaware of execution requirements of tasks. Marriage of the scheduling
technology with workflow management is thereby essential in the design of middleware
for geographically distributed grids spanning organizational domains. However,
existing endeavours concentrate only on intra-domain workflow execution and use
ad hoc, non-layered, non-standard solutions that reversely affect
cross-organizational collaboration. In addition, they lack support for
efficient data modelling and handling, especially crucial for performance of
data intensive applications in distributed data scenarios. In this talk, I will
present some efforts underway at IBM in conjunction with its partners
overcoming some challenges in designing standardized middleware for grid job
flow systems.
The slides: — [PPT]
Gargi B Dasgupta — Gargi B Dasgupta received her Ph.D. in Computer Science in 2003
from
Dr. David De Roure, Carole Goble & Robert Stevens “Designing the myExperiment Virtual Research Environment for the Social Sharing of Workflows”
Summary: — Many scientific workflow systems have been developed and are serving to benefit science. In this paper we look outside the workflow to consider the use of workflows within scientific practice, and we argue that the tremendous scientific potential of workflows will be achieved through mechanisms for sharing and collaboration – empowering the scientist to spread their experimental protocols and to benefit from the protocols of others. We discuss issues in workflow sharing, propose a set of design principles for collaborative e-Science software, and illustrate these principles in action through the design of the myExperiment Virtual Research Environment for collaboration and sharing of experiments.
The slides— [PPT]
Speaker Bio: — Prof David De Roure
leads the e-Science activities in the
Summary: — Provenance management has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. This paper proposes an approach to provenance management that seamlessly integrates the interoperability, extensibility, and reasoning advantages of Semantic Web technologies with the storage and querying power of an RDBMS. Specifically, we propose: i) two schema mapping algorithms to map an arbitrary OWL provenance ontology to a relational database schema that is optimized for common provenance queries; ii) two efficient data mapping algorithms to map provenance RDF metadata to relational data according to the generated relational database schema, and iii) a schema-independent SPARQL-to-SQL translation algorithm that is optimized on-the-fly by using the type information of an instance available from the input provenance ontology and the statistics of the sizes of the tables in the database. Experimental results are presented to show that our algorithms are efficient and scalable.
Summary — recently, scientific workflows have gained tremendous momentum due to their critical role in e-Science. Although several scientific workflow management systems have been developed over the past few years, there is alack of support for ensuring reliability of scientific workflows and for controlling the release and propagation of information. In this paper, we propose to model a scientific workflow using a hierarchical state machine and present techniques for verifying and controlling information propagation in scientific workflow environments based on hierarchical state machines.
The slides: — [PPT].
Summary — Scientific workflow often requires dynamic selection of workflow routines, Web Services or workflow engines. Multiple copies of a Web service or multiple workflow engines with different performance are chosen at run time to optimise the workflow. However, simple performance formula for selecting Web services or workflow engines is difficult to find. In some cases, the Services with the same functions but with different algorithms (for example, data clustering services implemented by using the algorithms of Neural Network or SVM) may be only chosen at the just pre-execution point according to the intermediate results of workflow execution. We here use production rule strategy to solve the above problem. In this paper, we present a framework for production rule based dynamic workflow, and give out a primary implementation of this framework
The slides: — [PPT].
Summary — Zhiming Zhao presented some observations collected from the past organized workshops on workflow (WES06, WES07, WEBS,). It seems that interest is shifting slowing towards the semantics and workflow interoperability. While the first workshops showed large number of submissions on workflow architecture and modelling, more and more submissions are addressing semantics and interoperability.
“Discussion” Zhiming Zhao & Adam Belloum
Generic statements
- Web 2.0 promote collaboration, sharing, and reuse, can be used to sharing and reusing workflow
Information for Application
developers trying to use scientific workflow
- A web site for workflow patterns: www.workflowpattens.com
Statements from Application
Developers
- There is a major gap between application users and developers and workflow world: a lot of systems are out and it is difficult for the end-user to choose the appropriate workflow
- Users do not care a lot about the architecture or the system they are using as much as they care about the usability
Statements from workflow &
middleware Developers
- Developing a generic workflow system that glue horizontally (6 application domains), might it is too ambitious! Basic principles are the same but the process on the top varies a lot.
- Service/Grid Service and application to their application e-science framework: Not completely mature to the science
- Maturity of the Grid is not up to the point to make usable for the workflow system. `
- Issues discussed here have been already discussed in the industrial context for more than a decade. It is likely that we cannot be executed in e-science because science workflows are ad-hoc and not well defined. Separately data from program on the design data.
- Interoperability at resource can be solved by SOA, but it can not play a role Interoperability at high level,
- To use workflow technology need to understand the difference between business workflow and scientific workflows. In the business workflow the final user is not an expert, he just use the workflow as it is, why in the scientific world the end-user is an application domain expert how can modify, extend, or remove part of the workflow. How can we satisfy the user can be IBM has an old programming model known as a “Two level programming” or “programming at large”
- User level of workflow abstraction should be at component level, providing some library of components (generic components) that can be entire workflow as in the MyExperiment talk or atomic components as it is the case of many workflow systems (Triana, Kepler, …).
- The success of workflow Working with people in different discipline, make them understand the workflow concepts to allow them to think in terms of workflow and thus they will take the workflow certain level of understanding of workflow concepts
- Taking by the hand could be a good approach to teach workflow end-users to understand and develop workflow which can take advantage of the work
- Paradoxical statement: automate the work to improve the performance, but sometimes when you automate hamper the communication between participating components.
- Sharing of workflow? Why we are interested in workflow sharing? Reusability and social website
Technical statements:
- Support for interactivity:
o Tivoli Workload Scheduler (TWS) from IBM has some support for interactivity but not rich or expressive enough to be used for scientific workflow
o Taverna2 will be released very soon; it will have more features allowing more monitoring, and interactivity
- Scalability:
o BPEL can scale quite well, some experiments have shown that workflow with 1000 tasks can be managed without a big problem. Scaling is not a problem for BPEL, it is more the Ad-hoc definition and design of scientific workflow that cannot be handled correctly by BPEL. Current work on BPEL is try to tackle information
- Others
o Commodity toolkit like the COG (Globus project) maybe very useful to abstract workflow
o Quick comment on scripting: Mashups are workflows? (J. fox). Mashups are not well defined while workflows are defined. Question: if you define mashups does it make them workflow? Mashups can be the first step before the workflow matures!
o
What did we learn from this workshop?
- Despite the effort of organizing a large workshops and conferences on workflow technology, still the end users are confused and do not know how to take fully advantage of this technology. However it was clear that a number of potential workflow users came to learn how workflow can help them to solve the problem.
- We also learn that it is likely that interoperability is a must and cannot be avoid, even if everybody agree it is challenging and it requires some standardisation which not there yet.
This workshop was not the only place in this conference where workflows have been addressed
- On the Tuesday in sessions 1B was also dedicated to workflow and their usage on Grid and P2P. As well as in the keynote given by both Thomas Fahringer and David de Roure from which we can learn a lot about
o Thomas Fahringer: simplicity comes has a direct impact on the on the performance, maybe a solution is to have different a very simple interface with the possibility to optionally go into some level of details to improve the performance when needed.
o David de Roure: things to remember from his talk about workflow “make e-science easier, and Web 2 makes workflow easier”
The SWBES07 continues the discussion we had in SWBES06:
workflow system usability, interoperability, and collaboration between
different developers. More application users joined the discussion, it is clear
that the gap between what domain scientists need and the workflow system
functionality is still big. Potential application workflow developers are not
sufficiently support or guided to enter the realm of workflows simply because
there are too many choices and it is not easy for them to make a choice. It is
indeed an important decision for the end-users because it will effect the daily
working environment. It clear that workflow developers have to do some PR work
and as it was suggested in the discussion maybe take the application developers
by the hand to do their first steps in this new world. The workshop highlighted
several research lines which are along the direction to bridge this gap:
1.
Collaborative working
environment .e.g., based on Web 2, will play an increasingly important role in
sharing knowledge between scientists. Via a social environment of specific
domains, scientists can more efficiently use resources such as workflows,
workflow processes, and data in they specific research.
2.
The usability of
workflow systems is a key issue for scientists. How to bring domain scientists,
application developers and workflow developers work together and let them
understand each other will be a challenging issue.
3. From the invited talk, we can see industrial standardised workflow language, e.g., BPEL, has been used in scientific workflow not only for coupling services but also for managing massive computing tasks.