Friday, October 31, 2003

Example Workflow to Find and Characterize Novel Genes in Genomic Sequences


Thursday, October 30, 2003

Email corepondanse with Discovery Net Project


From: Moustafa Ghanem [mmg@doc.ic.ac.uk]
Sent: Thursday, October 30, 2003 2:45 PM
To: dcolonn@ncsu.edu
Cc: Patrick Wendel
Subject: Re: Discovery Net in Survey Paper

Dear Daniel.

First of all, I am sorry for the late reply. Second, thank you very
much for your interest in in the Discovery Net project.

The main person to talk to about the technical details is Patrick Wendel
(pjw4@doc.ic.ac.uk) who can give you more specific details about the
architecture and how it relates to the COG.

In all cases, here is brief description about the way discovery net
workflows operate:

The target user for workflow definition is an application domain expert
and not a Grid computing expert. So, in a nutshell, the Discovery Net
workflow system is based on data-flow dependancy graph. Each element
(node) of the workflow can describe some constraints it has in terms of
execution whether:
a) its execution must occur on a particular resource known statically, or
b) if it can execute as a pure Java implementation, or
c) through a script execution, or
d) it can potentially run on any resource available at run-time, or
e) finally whether the exact resource to be used can only be known at
runtime and determined through looking up a network weather service
and/or a Grid information resource service for example.

In this way it is possible to design high level workflows as, for
instance, the data transfer is implicit so, for the end-user, the
workflow definition concentrates on functional composition and
meta-information issues.

In addition, Discovery Net workflows can co-ordinate the execution of
OGSA services through an Grid/Web Service interface. Any of these
workflows can then be published as a new grid service for programmatic
access from other systems.

You can find more information from the latest journal paper we published
(available from the discovery-net web site http://www.discovery-on-the.net).

Please have a look and tell us what further information we can provide
you with.
We may also schedule a teleconference some time soon to discuss things
in details.

Best regards

Moustafa

Dr. Moustafa M. Ghanem
Discovery Net Project Manager


dcolonn@ncsu.edu wrote:

>Hi, I'm a grad student and part of the Globus Java CoG project. I'm
>working with some other people doing a survey of grid workflow systems.
>
>I was hoping to talk to someone who could give me some information about
>the current state of the discovery-net project so we can include it in our
>survey paper which will be presented to the next global grid forum.
>
>Could you direct me to someone who might be able to help?
>
>Regards,
>Daniel Colonnese
>
>
>
>


Tuesday, October 28, 2003

Standard Evaluation: "standard table summarizes the results of the comparison "

Sunday, October 26, 2003

BioPipe
SCIRUN
WASA
myGrid
Triana
Chimera
Ptolemy-II
DiscoveryNet
wftk
SWFL
ICENI
BioOpera
ILab
GridAnt

BioPipe
http://www.biopipe.org/

BioPipe is one of a series of related packages for carrying out bioinformatics analysis from the Open Bioinformatics Foundation. Biopipe is a collection of Perl modules for constructing workflows from BioPerl applications. Much of the code and ideas are borrowed from the Ensembl pipeline project.

SCIRUN
http://software.sci.utah.edu/scirun.html

SCIRUN is an application begun in 1992 by National Center for Research Resources (NCRR) Center at Utah. SCIRUN is a GUI “scientific workbench” that allows users to construct, manage, and debug simulations in domains such as physics and neurobiology. SCIRUN simulation may be thought of as workflows since they allow for parallel and conditional execution of tasks. Some SCIRUN applications allow for jobs to be run on a grid. SCIRUN also provides extensive scientific visualization libraries.

WASA
http://dbms.uni-muenster.de/menu.php3?item=projects&page='wasa/index.php3?id=1'

WASA2 is an application that supports the creation and execution of workflows from CORBA components. It includes a GUI workflow modeler as well as controls to modify a workflow while it is running. WASA2 has been used in the domains of geoprocessing and molecular biology as well as for modeling business processes. WASA2 uses a strictly object orient approach where workflows are represented as CORBA objects and can be displayed as UML diagrams.

myGrid
http://www.mygrid.org.uk

The myGrid Project is a collection of bioinformatics web services and grid services hosted by the European Bioinformatics Institute. myGrid uses SoapLab and the Apache Axis framework to provide a web service interface to a collection of grid-based data-analysis services. Scientists are able to compose, edit, and save workflows with either a web portal or a GUI workbench. The system currently supports two XML workflow languages, a subset of IBM's WSFL and a more domain-specific language called XScufl. myGrid executes workflows defined in these documents with the IT Innovation Workflow Enactment Engine.

Triana
http://www.triana.co.uk/

Triana is a set of open source java libraries that provide a GUI interface for building workflows from a collection of OGSA grid services. Triana has been leveraged by several other projects including myGrid and Chimera. Triana contains an engine for coordinating and invoking a set of grid services. Triana also contains a peer-to-peer component based on JXTA allowing Triana to run a variety of devices including PDAs and mobile phones.

Chimera
http://www.griphyn.org/chimera/

Chimera a system used to find or create a workflow for a series of OGSA grid services to provide a scientist’s requested “data product.” Chimera was begun in 1999 and is enabled by the Pegasus Planner, a GriPhyN project at ISI. The Chimera is middleware designed to be invisible from a client who requests the data product. The workflow is represented as an “abstract program execution graph.” This graph is transformed into an executable DAG for the Condor DAGman scheduler.


Ptolemy-II
http://ptolemy.eecs.berkeley.edu/

Ptolemy-II is platform for is a visual modeling tool written in Java. It was begun in 1997 at UC Berkley. Several recent SDM efforts have extended the Ptolemy-II platform to allow for the drag-and-drop creation of scientific workflows from libraries of actors. The Ptolemy actor is often a wrapper around a call to web service or grid service. Ptolemy leverages an XML-meta language called MoML to produce a workflow document describing the relationships of the entities, properties, and ports in a workflow. Presently, Ptolemy actor libraries exist for the domains of bioinformatics and ecology at NCSU and SDSC.

DiscoveryNet
http://www.discovery-on-the.net

DiscoveryNet is a collection of software built on top of the UNICORE grid system for arranging database access and knowledge discovery procedures. DiscoveryNet was begun in 2001 at Imperial College of Science. DiscoveryNet provides a means of describing workflow between analysis service providers, data owners, and scientists who arrange and execute these workflows. DiscoveryNet makes use of the OSGSI components and protocols as well as its own protocol for workflows, Discovery Process Markup Language (DPML). This language is used for constructing, running, and managing grid-services, as well as recording their history

wftk
http://www.vivtek.com/wftk/

Open-source workflow toolkit, or wftk, is the name of a generalized workflow system, implemented as a series of Java libraries. wftk was begun in 1998 by Michael Roberts. wftk uses its own high level language to describe workflow, and stores workflow related documents in a series of XML “datasheets.” wftk workflow engine contain two models of a workflow: a “task-based” model, essentially a DAG, and a “state-based” model, essentially a FSM. wftk libraries can also be run as web services.

SWFL
http://www.cs.cf.ac.uk/User/Yan.Huang/GridWF/SWFL.htm

Service Workflow Langue (SWFL) is an XML-based meta language for the construction of scientific workflows from OGSA-compliant services. SWFL was developed at Cardiff University in 2002 and 2003. SWFL extends IBM’s WSFL and supports a new set of conditional operators and loop constructs as well as arrays and objects. SWFL has a workflow engine based on the JISGA (Jini-based Service-oriented Grid Architecture) architecture. JIGSA uses JavaSpaces as shared memory, for parallel execution through grid services. Therefore, SWFL supports the integrating parallel programs into the workflow.

ICENI
http://www.lesc.ic.ac.uk/iceni/

ICENI (Imperial College e-Science Net-worked Infrastructure) is a collection of grid middleware used for providing and coordinating grid services for e-science applications. ICENI includes a GUI workflow construction tool integrated into the NetBeans IDE. This tool can create a textual “execution plan” of the workflow in an XML-meta language derived from YAWL (Yet Another Workflow Language). The ICENI workflow system supports conditionals, loops, and parallel execution.

BioOpera
http://www.inf.ethz.ch/personal/bausch/bioopera/main.html

BioOpera is an application for the composition of various bioinformatics applications built on top of the OPERA architecture. The project was begun in 1999 at the Swiss Federal Institute of Technology. BioOpera provides a GUI tool for the construction of workflows from web services and grid services. BioOpera represents a workflow “process template” as a DAG, and translates this set of activities into an execution script for Condor-G. Additionally, the web service invocation engine within BioOpera uses UDDI, and the grid service component uses service description documents.

ILab
http://www.fhrg.fhg.de/index_en.html

ILab is a set of grid middleware developed at The Fraunhofer ICT Group since 2001. ILab includes a GUI tool for the construction of workflows from grid services. This tool uses an XML-based language called Grid Application Definition Language (GADL) to assemble applications from grid services, and Grid Job Definition Language (GJobDL) to describe the runtime behavior of such applications. ILab uses Petri nets instead of DAGs to model and control the workflow.

GridAnt
http://www-unix.globus.org/cog/projects/gridant/

GridAnt is an extension of the Apache Ant build tool residing in the Globus COG kit. GridAnt allows for the construction of client side workflow for Goblus Toolkit 3 application by allows for the specification of precondition and parallel tasks in much the same way as the Ant build tool.

Thursday, October 16, 2003

Summary Descriptions

myGrid
http://www.mygrid.org.uk

The myGrid Project is a collection of bioinformatics web services and grid services hosted by the European Bioinformatics Institute. myGrid uses SoapLab and the Apache Axis framework to provide a web service interface to a collection of grid-based data-analysis services. Scientists are able to compose, edit, and save workflows with either a web portal or a GUI workbench. The system currently supports two XML workflow languages, a subset of IBM's WSFL and a more domain-specific language called XScufl. myGrid executes workflows defined in these documents with the IT Innovation Workflow Enactment Engine.


Chimera
http://www.griphyn.org/chimera/

Chimera a system used to find or create a workflow for a series of OGSA web services to provide a scientist’s requested “data product.” Chimera was begun in 1999 and is enabled by the Pegasus Planner, a GriPhyN project at ISI. The Chimera is middleware designed to be invisible from a client who requests the data product. The workflow is represented as an “abstract program execution graph.” This graph is transformed into an executable DAG for the Condor DAGman scheduler.


Ptolemy-II
http://ptolemy.eecs.berkeley.edu/

Ptolemy-II is platform for is a visual modeling tool written in Java. It was begun in 1997 at UC Berkley. Several efforts have extended the Ptolemy-II platform to allow for the drag-and-drop creation of scientific workflows from libraries of actors. The Ptolemy actor can be a wrapper around a call to web service or grid service. Ptolemy leverages an XML-meta language called MoML to produce a workflow document describing the relationships of the entities, properties, and ports in a workflow. Presently, Ptolemy actor libraries exist for the domains of bioinformatics and ecology at NCSU and SDSC.

Tuesday, October 14, 2003

A Poem by Amelia A. Lewis - A child's garden of web services

When the WSDL wizards waffle, wond'ring whether to rename,
and the message mavens mutter, meaning MEPs must be retained,
then the service 'scriptions suffer, since the source will be the same,
and the theorists threaten thunder, thinking nothing will be gained!

When the feature fans are flustered, fearing future-free designs,
and the context corresponding to the properties defined
is hiding helpful highlights of heuristics of all kinds,
then the protocol extenders ponder whether they can bind!

When the RPC revanchists reach for reason to remote,
and the messaging evang'lists muster arguments to not,
then the working group grows rancorous and puts it to a vote,
and the minute taker notes the score and elsewise ... not a lot!


Talked to Dr. Singh today. He may be starting some BPEL-related project.

In regards to the survey paper with Gregor:
I've been compiling bibtext citations. As far as I can tell, we're building a big list of all the projects and their attributes. The plan is to submit the survey paper to the next global grid forum.

Get it here

I have literature for:
BPEL
PTOLEMY
CHIMERA
DAGMan
myGrid
Misc. Visual Workflow Programs
Old Surveys

At the last GGF, there was a meeting to discuss forming a workflow working group, and this survey could influence the shape that the Globus group takes. Some kind of workflow tool, probably GridAnt, will be included in the next Globus Toolkit release.

In regards to a grid plugin for PT-II:

For web services, there is a generic, stub-less invocation framework (WSIF) that can be used to make a generic web service actor. There is no way to do stub-less invocation of grid services so every PT-II agent must be a wrapper around a grid service stub. I've talked to Zhengang about this problem, and we agree that we could hack it and do some code generation, but that wouldn't be very useful. Furthermore, the problem of making general PT-II agents is complicated by the fact that WSIF may be deprecated and looming WSDL 1.2 changes a lot of stuff.

In regards to WSDL 1.2:

They are replacing port types with interfaces, and having interface inheritance. And there’s a component, imports, ports are now endpoints, and no more service overloading. This new spec is going to wreck havoc!!!

Part 1: Core Language
Part 2: Message Patterns
Part 3 : Bindings
TODO LIST

Monday, October 13, 2003

web Services is the focus of the october CIO magazine, which says, "Gartner predicts American business is going to squander $1 billion on misguided Web services projects by 2007."

http://www.cio.com/archive/100103/standards.html

Friday, October 10, 2003

a coherant explanation of Web Services!

Enterprise Software / Web services in serious jeopardy - Tech Update - ZDNet

Tuesday, October 07, 2003

Scientific Workflows Survey

Survey is underway!

Sunday, October 05, 2003

Review Projects using Abstract and Concrete Workflows

The notion of an abstract workflow is to separate abstract entities such as the actors, data, and operations from their concrete instances. Several workflow systems have been implemented that make use of abstract workflows.

The Ptolemy project is concerned with workflow construction on the ‘client-side.’ The GriPhyN project is concerned with workflow construction on the ‘server-side.’

MoML and Ptolemy

The first is the Department of Energy funded Scientific Data Management Center project. Teams at North Carolina State University and the San Diego Supercomputing Center are leveraging the MoML and Ptolemy tools from UC Berkley. The project has generated a great deal of literature and software for scientific workflow.

Modeling Markup Language [1] (MoML) is an XML meta-language primarily used for expressing models of software systems. MoML defines tags for Entities, Properties, Relations, Links, Ports, as decoupled from a specific class or service that the model represents.

MoML could be used to model a software program where each class is an entity, and each method is a collection of input and output ports. Data fields are modeled as properties. Entities can be inherited from each other and linked to each other.

A MoML document could also be used to define an abstract web service composition workflow. Each entity is a service provider, and services have a collection of ports. There is nothing in MoML that would prevent one from using it to model a grid services composition workflow.

A model of an abstract workflow is only useful if it is used to construct a concrete workflow, which is then used to do work. Ptolemy is a software program used to study heterogeneous modeling and includes an engine for processing MoML documents. The [2] and [3] projects use Ptolemy to allow for visual drag-and-drop construction of workflow for bioinformatics and environmental ecology.

The process of creating a workflow with the Ptolemy software is centered on creating Java classes that extends a build-in Actor class. Usually, the Actor corresponds to an MoML entity, and the Actors is a wrapper around a web service stub or a local program. A concrete workflow consists of a series of these actors interacting in way deemed legal by the abstract workflow document.

Vouk et. al. [4] explains the process as such:

Thus the core idea of our approach and system is that the scientist designs an abstract workflow (AWF) from the repository of problem-oriented abstract tasks while the system tries to derive from AWF an executable workflow (EWF) in terms of the available web services.

This approach to workflow construction is also discussed in [6] and [7].


The GriPhyN project

The second body of work is the National Science Foundation-Funded Grid Physics Network (GriPhyN). This project leverages a collection of grid technologies including Condor-G, DAG-Man, and Globus.

The workflow management software in GriPhyN [5] is used to ensure that the grid can provide certain ‘data products’ to its users. As in the Ptolemy system, there are a set of pre-set components which can be used to form a workflow. Specifically, the system is concerned with translating user requests to the grid into a series of job submissions and data requests.

The abstract workflow is used as a planning mechanism, because when the abstract workflow is defined it is checked again the actual available services.

Selecting and configuring application components to form an abstract workflow. The application components are selected by examining the specification of their capabilities and checking to see if they can generate the desired data products (p. 2).

It seems that the abstract workflow is represented as a data structure in the Pegasus middleware. The concrete workflow is expressed as a multi-part job.



[1] Edward A. Lee and Steve Neuendorffer. MoML — A Modeling Markup Language in XML — Version 0.4. Technical report, University of California at Berkeley, March, 2000. Online. Available:
http://www.gigascale.org/pubs/16/moml_erl_memo.pdf

[2] Scientific Data Management Center at NC State. 2003. Online. Available:
http://sdm.csc.ncsu.edu/

[3] Science Environment for Ecological Knowledge. 2003. Online. Available:
http://seek.ecoinformatics.org

[4] Ilkay Altintas, Sangeeta Bhagwanani, David Buttler, Sandeep Chandra, Zhengang Cheng, Matthew A. Coleman, Terence Critchlow, Amarnath Gupta, Wei Han, Ling Liu, Bertram Ludascher, Calton Pu, Reagan Moore, Arie Shoshani, Mladen Vouk, "A Modeling and Execution Environment for Distributed Scientific Workflows" to be published in "Real World Semantic Web Applications", IOS Press, editor V. Kashyap, 2002. [Online]. Available:
http://renoir.csc.ncsu.edu/Faculty/Vouk/Papers/Cheng/BookChapter/Cheng_IOS_03.pdf

[5] Ewa Deelman, James Blythe, Yolanda Gil, and Carl Kesselman, "Workflow Management in GriPhyN," Chapter in “Grid Resource Management,” J. Nabrzyski, J. Schopf, and J. Weglarz editors, Kluwer, 2003.
http://www.isi.edu/~deelman/Pegasus/grm_chapter.pdf

[6] B. Ludaescher, A. Gupta, and M. E. Martone, "A Model-Based Mediator System for Scientific Data Management", bibl. Morgan Kaufmann, Chapter in: "Bioinformatics: Managing Scientific Data," editors, T. Critchlow and Z. Lacroix, 2003.
http://citeseer.nj.nec.com/cache/papers/cs/27492/http:zSzzSzwww.sdsc.eduzSz~guptazSzpublicationszSzmbm-chapter-rev.pdf/a-model-based-mediator.pdf

[7] Sandeep Chandra, "Service-based Support for Scientific Workflows" Thesis, 2002.[Online]. Available:
http://renoir.csc.ncsu.edu/Faculty/Vouk/Papers/Chandra/Chandra_MS_Thesis.pdf



Thursday, October 02, 2003

The next things investigate...

The grid service workflow system at the San Diego Supercomputer Center Bertram Ludaescher ludaesch@sdsc.edu
Steve Mock mock@sdsc.edu

The xml lanague MoML at UC Berkley

GridAnt

comming soon...

formal citations

This page is powered by Blogger. Isn't yours?