Wednesday, April 28, 2004
Thesis draft #2 - [PDF]
Monday, April 19, 2004
Service Availability Models
In studies such as [Vouk87] we see that there are benefits to the end-user if an application can advertise its Availability.
A = Availability
F = Mean Time to Failure
R = Mean Time to Recovery
A = F / (F+R)
Figure X: Availability Definition
Let us consider how a grid service could collect F, R and A. The most straightforward way is to periodically ping the service and keep a log of how long the service is available and when it goes down, measure how long it takes to come back up.
For several reasons, the actual PING program is insufficient. PING might be sufficient for a simple HTML web page, however a service can "go down" by changing behavior, creating an intolerable number of exceptions, as well as simply not responding.
We should not ignore that the registry service from GTK3 keeps a "heartbeat" by receiving service data at regular intervals. F and R could not be calculated unless the registry kept track of expired service data.
For example, the registry could allocate seperate storage for the attribute and the GSH whenever service data is purged from the resigtry. The GSH would serve as a unique ID. Whenever new service data is added to the registry it would be filtered against the expired GSH strings. The time difference between the two attributes would give a recovery time estimate. The difference between the and the attributes would produce a best-case recovery time estimate. These estimates could be aggregated to create the mean time to recovery.
Keeping with this example, the service registry could keep track of the initial as well as the most recent and consider the differce to be the "uptime." The initial is reset whenever a service is purged from the registry. The running average of a service's "uptime" could be considered the Mean Time to Failure.
In lieu of using the and attributes, the registry could instead use service invocations as a unit of time. We could assume that a service left idle will not fail, and replace the traditional metric Mean Time to Failure with Mean Invocations to Failure. This could also go beyond measuring if the server will respond or not, by deriving exception information from the service data set.
It should be noted that the above models violates design principles of the GT3 soft-state management architecture. In this model service data (F, R, A) originates from the registry not from the source service itself. There are several other problems with this model as well. The first is that a GSH makes a poor unique identifier because URLs are notorious for changing. However, we cannot use the namespace string because a single service can have many simultaneously deployed instances. Requiring a new unique service instance identifier complicated an already messy specification. It is common that a service is taken down and never again redeployed, so registry-enhanced service data must also have expiration. Due to these problems, a case can be made for allowing services to measure themselves or designing a new type of GT3 registry.
A service could advertise its own F, R, and A by measuring the Grid Service Container, the Globus Stand Alone Server, or the AXIS server. This assumes that all failures would be at a high-level, as this model is clearly insufficient in the case of hardware failure. Another issue is the validity of this data. How could a client trust that the service is as reliable as advertised?
Consider an aggregate service as presented in figure X. This type of aggregate service could answer Availability questions from a collection of replicas that advertise F and R.
From the client’s perspective, this might be the most elegant solution. The client consumes a high-level redundant or fault-tolerant service. The intermediate service serves stateless, transactions-based requests from persistent stateful services. Many have speculated that intermediate services will act as brokers by performing a combination of duties including enforcing SLAs.
In studies such as [Vouk87] we see that there are benefits to the end-user if an application can advertise its Availability.
A = Availability
F = Mean Time to Failure
R = Mean Time to Recovery
A = F / (F+R)
Figure X: Availability Definition
Let us consider how a grid service could collect F, R and A. The most straightforward way is to periodically ping the service and keep a log of how long the service is available and when it goes down, measure how long it takes to come back up.
For several reasons, the actual PING program is insufficient. PING might be sufficient for a simple HTML web page, however a service can "go down" by changing behavior, creating an intolerable number of exceptions, as well as simply not responding.
We should not ignore that the registry service from GTK3 keeps a "heartbeat" by receiving service data at regular intervals. F and R could not be calculated unless the registry kept track of expired service data.
For example, the registry could allocate seperate storage for the
Keeping with this example, the service registry could keep track of the initial
In lieu of using the
It should be noted that the above models violates design principles of the GT3 soft-state management architecture. In this model service data (F, R, A) originates from the registry not from the source service itself. There are several other problems with this model as well. The first is that a GSH makes a poor unique identifier because URLs are notorious for changing. However, we cannot use the namespace string because a single service can have many simultaneously deployed instances. Requiring a new unique service instance identifier complicated an already messy specification. It is common that a service is taken down and never again redeployed, so registry-enhanced service data must also have expiration. Due to these problems, a case can be made for allowing services to measure themselves or designing a new type of GT3 registry.
A service could advertise its own F, R, and A by measuring the Grid Service Container, the Globus Stand Alone Server, or the AXIS server. This assumes that all failures would be at a high-level, as this model is clearly insufficient in the case of hardware failure. Another issue is the validity of this data. How could a client trust that the service is as reliable as advertised?
Consider an aggregate service as presented in figure X. This type of aggregate service could answer Availability questions from a collection of replicas that advertise F and R.
From the client’s perspective, this might be the most elegant solution. The client consumes a high-level redundant or fault-tolerant service. The intermediate service serves stateless, transactions-based requests from persistent stateful services. Many have speculated that intermediate services will act as brokers by performing a combination of duties including enforcing SLAs.
Monday, April 12, 2004
a thesis draft [pdf] is up!
Saturday, April 10, 2004
4.2 Models for Fault Tolerance
Fault tolerance is a broad computer science research domain. However, few fault tolerant concepts are applied to web services. In this section I will explore how the new technologies of web services, grid services, meta-data, and workflow could affect some of the most mainstream fault tolerance models. I will consider basic recovery blocks and majority voting models.
Acceptance Test
The notion of acceptance testing is essential to all fault tolerance voting or consensus arrangements. The acceptance test may determine functional equivalence or may determine correctness measured against an always-correct result, an oracle. Traditionally, when the output of two programs is a data structure in the same programming language, determining equivalence is relatively straightforward. However, when considering the output of two web services, our notion of acceptance testing should be expanded to more precisely describe what we are measuring.
There are several different types of general tests that are applicable when considering fault-tolerance in a service-orientated system. First is the notion of XSD Schema Validation. There are several free and commercial libraries that check a XML document against a namespace definition. We can, at minimum, check to see if all services return valid SOAP envelope documents. Often, the content of the SOAP header is XML that complies with namespace restriction to provide meta-data. The structure and content of the SOAP header is important when implementing WS-Security, WS-Availability, and related standards. The SOAP body is also usually a document, and then the service is called a document-style service. We can compare two SOAP document for namespace equivalence, schema syntax equivalence, and check that the data contained in the response documents are identical or approximate.
Consider two services, X and Y. These acceptance tests require that the checker have a copy of the IO data structured.
Comparison Source Document Validation Document
Validate SOAP Message SOAP message X SOAP schema [SOAP12]
Validate SOAP Header SOAP header X syntax specification XSD or DTD
Validate SOAP Body SOAP body X syntax specification XSD or DTD
Verify Namespace compatibility SOAP body X, Service X namespace SOAP body Y, Service Y namespace
Data Equivalence SOAP body X SOAP body Y
Schema subset/superset check Namespace of X Namespace of Y
Schema subset/superset check Namespace of X Namespace of Y
Figure X: Message Level Acceptance Tests
An acceptance test can also check to see if an output message conforms to domain-specific syntax. Domains that have a single strong professional organization tend to develop successful standardized XML vocabularies. Real Estate XML is often given as an example of a clear standard that has “won” and become a “true standard.” However, in domains such as Banking, many competing professional organization propose competing standards. In my estimation there are approximately 500 widely-used domain-specific XML ontologies. See [SA04] for a list of approximately 100 popular XML vocabularies.
Another set of acceptance tests may require that the checker have a copy of the WSDL it expects. The input and output messages are specified in the WSDL and the acceptance test may keep a local copy. This could be useful, if the service provider changes both the message behavior and the document syntax.
Comparison Source Document Validation Document
WSDL and Local WSDL of X Local WSDL
WSDL and WSDL WSDL of X WSDL of Y
Figure X: WSDL Acceptance Tests
All the above checks assume that X is questionable and Y is always acceptable. However, in most production systems there is no oracle service, nor is it possible to calculate an answer locally. That is not to say that schema validation is not useful. Generalizable acceptance tests could allow for fault tolerant mechanism to be added into existing web service systems with constructing a custom test for each service. The most common reason that a message would fail such a test is that it produced an exception, and the exception message appears in place of the SOAP body.
Alternate Versions
The notion of an alternate version is essential to most fault tolerance voting or consensus arrangements. We first consider redundancy and then independent versions. Two-out-of-three majority voting is a tolerant and low cost system that can be implemented with either replicas or independent versions.
Service replicas are a web service deployed on independent servers. The service may be replicated across same-platform machines by simply copying the relevant files. A service can be duplicated across platforms by recompiling/reconfiguring the service for its destination.
A replica may also be a grid service created from a factory. Replicas that are created with the same state data and that are always used together should give the same results. However, once one service instance behaves differently from the others, its state data may have changed. Can a service loose trust once it disagrees with the other two? The client could even request that the factory create a new service instance in order ‘get a fresh opinion.’
So far we have described three different types of replicas, true duplicates, replicas compiled from the same source, and replicas instantiated from the same factory. Let us now consider independent versions.
If three web services or grid services were to be developed independently, it would make sense to implement these web services in different programming languages. For example, if all three web services were written in Java, then they would inevitably share some libraries, even if they come from different development team. The 3 most popular languages for creating web services are Java, C#, and Perl. The web service libraries for these languages come from separate development efforts and have substantially different implantations.
Web services are usually a wrapper around an existing program. Grid services are usually a wrapper around a job manager which can execute an existing program. Either way, service independence does not guarantee that the services are independent at the back-end.
Measuring service diversity and dependence of failures is also an issue. The services that comprise a fault tolerant system should be as diverse as possible. [Mcallister&Vouk] suggest that we can quantify diversity in different versions by comparing the results from a set of random inputs. The idea is to use a large enough sample so as to distinguish the degree of failure overlap. This class of experiments can be extended to web services without modification.
In service orchestration, the problem of measuring downstream dependency may prove significant when trying to create fault tolerant systems. If two services are both dependant on a third service, we cannot consider these web services to be alternates versions. However, there is no currently standard way to check. If a service advertised its position in the workflow, by perhaps publishing the workflow document, then all endpoints are visible and a downstream dependency check could occur. The workflow description document URL could be published by its component services or a registry.
Of course, the costs of creating independent versions are significantly greater than creating replicas. As the previous chapter demonstrated, much can be accomplished with service replicas.
Recovery Blocks
Figure X: Recovery Blocks, [Mcallister&Vouk]
One of the most common fault-tolerant software schemes is the recovery block. The output of a module, or in this case a service, is tested for acceptability with an acceptance test. If the acceptance test determines that the output is not acceptable, it rolls back to a state before the service was executed. It then asks the second service to execute, and so on. Every SOAP-level acceptance test
The problem with recovery block is choosing what acceptance test to use. An extreme example may be to use an independent version as the acceptance test. The most straightforward acceptance test is probably to the Validate SOAP Body test describe in the previous section. This is a strong acceptance test because it guarantees that the program will be able to continue executing, however it does nothing to check the correctness of data in the output document.
N-version programming
Figure X: N-version programming, [Mcallister&Vouk]
N-version programming proposes parallel execution of N alternate versions. A module must then aggregate the N results and perform adjudication of their outputs. Part of this adjudication module can be a voter. Each version can have a weighted vote or equal vote.
Consensus voting is a generalization of majority voting. It uses the following algorithm to select which answer to use:
If there is a majority agreement (m >= ┌ (N+1)/2 ┐, N > 1), then this answer is choose as the correct answer.
Otherwise, if there is a unique maximum agreement, but this number of agreeing versions is less then ┌ (N+1)/2 ┐, then this answer chosen is the correct one.
Otherwise, if there is a tie in the maximum agreement number from several output groups, then
If consensus voting is used in N-version programming, one group is chosen at random and the answer associated with this group is chosen as the correct one.
Else if consensus voting is used in consensus recovery block, all groups are subjected to the acceptance test, which is then used to choose the correct output.
Figure X: Consensus voting, [Mcallister&Vouk]
N-version programming has several synergies with to web service systems. Obviously, each web service can correspond to a version. Another web service may be an adjudication mechanism.
Intermediary Web Services
Several groups [Annraí][Bacigalupo][Chi&Wu] have done work on intermediate web services. Unfortunately, most intermediary web services, today, are point solutions to very narrow problems. Consensus voting web services hold the potential to dramatically reduce the costs of adding fault-tolerance to a web service system. All the acceptance tests listed in figure X can be performance on any pair (or more) of web services without any regard to the domain of the application. A general purpose web service using schema validation as the acceptance test would allow for redundancy or voting models, so long as there are multiples versions.
However, one could provide these multiple versions from independent servers or they may all be hosted on the same server. There are several mechanisms to address this.
The address of the endpoint URL could be used to uniquely identify. Albeit imperfect, Endpoints names or IP addresses may prove useful.
The consumer of the result of a remote adjudication system must be able to trust the result. A lack of trust in makes these services somewhat impractical for commercial products, but they may first adopted for scientific research. Consider an adjudication web service for Bioinformatics. This could be possible under a Molecular Biology Data Model published under the NCBI, if the service could reliably retrieve NCBI’s published Data Model Schema documents.
The costs of creating a custom voting system are significantly higher than consuming a web service. However, the main barriers to deploying such services are the issues of trust and ironically the lack of guarantee that the adjudication service will be available.
Fault tolerance is a broad computer science research domain. However, few fault tolerant concepts are applied to web services. In this section I will explore how the new technologies of web services, grid services, meta-data, and workflow could affect some of the most mainstream fault tolerance models. I will consider basic recovery blocks and majority voting models.
Acceptance Test
The notion of acceptance testing is essential to all fault tolerance voting or consensus arrangements. The acceptance test may determine functional equivalence or may determine correctness measured against an always-correct result, an oracle. Traditionally, when the output of two programs is a data structure in the same programming language, determining equivalence is relatively straightforward. However, when considering the output of two web services, our notion of acceptance testing should be expanded to more precisely describe what we are measuring.
There are several different types of general tests that are applicable when considering fault-tolerance in a service-orientated system. First is the notion of XSD Schema Validation. There are several free and commercial libraries that check a XML document against a namespace definition. We can, at minimum, check to see if all services return valid SOAP envelope documents. Often, the content of the SOAP header is XML that complies with namespace restriction to provide meta-data. The structure and content of the SOAP header is important when implementing WS-Security, WS-Availability, and related standards. The SOAP body is also usually a document, and then the service is called a document-style service. We can compare two SOAP document for namespace equivalence, schema syntax equivalence, and check that the data contained in the response documents are identical or approximate.
Consider two services, X and Y. These acceptance tests require that the checker have a copy of the IO data structured.
Comparison Source Document Validation Document
Validate SOAP Message SOAP message X SOAP schema [SOAP12]
Validate SOAP Header SOAP header X syntax specification XSD or DTD
Validate SOAP Body SOAP body X syntax specification XSD or DTD
Verify Namespace compatibility SOAP body X, Service X namespace SOAP body Y, Service Y namespace
Data Equivalence SOAP body X SOAP body Y
Schema subset/superset check Namespace of X Namespace of Y
Schema subset/superset check Namespace of X Namespace of Y
Figure X: Message Level Acceptance Tests
An acceptance test can also check to see if an output message conforms to domain-specific syntax. Domains that have a single strong professional organization tend to develop successful standardized XML vocabularies. Real Estate XML is often given as an example of a clear standard that has “won” and become a “true standard.” However, in domains such as Banking, many competing professional organization propose competing standards. In my estimation there are approximately 500 widely-used domain-specific XML ontologies. See [SA04] for a list of approximately 100 popular XML vocabularies.
Another set of acceptance tests may require that the checker have a copy of the WSDL it expects. The input and output messages are specified in the WSDL and the acceptance test may keep a local copy. This could be useful, if the service provider changes both the message behavior and the document syntax.
Comparison Source Document Validation Document
WSDL and Local WSDL of X Local WSDL
WSDL and WSDL WSDL of X WSDL of Y
Figure X: WSDL Acceptance Tests
All the above checks assume that X is questionable and Y is always acceptable. However, in most production systems there is no oracle service, nor is it possible to calculate an answer locally. That is not to say that schema validation is not useful. Generalizable acceptance tests could allow for fault tolerant mechanism to be added into existing web service systems with constructing a custom test for each service. The most common reason that a message would fail such a test is that it produced an exception, and the exception message appears in place of the SOAP body.
Alternate Versions
The notion of an alternate version is essential to most fault tolerance voting or consensus arrangements. We first consider redundancy and then independent versions. Two-out-of-three majority voting is a tolerant and low cost system that can be implemented with either replicas or independent versions.
Service replicas are a web service deployed on independent servers. The service may be replicated across same-platform machines by simply copying the relevant files. A service can be duplicated across platforms by recompiling/reconfiguring the service for its destination.
A replica may also be a grid service created from a factory. Replicas that are created with the same state data and that are always used together should give the same results. However, once one service instance behaves differently from the others, its state data may have changed. Can a service loose trust once it disagrees with the other two? The client could even request that the factory create a new service instance in order ‘get a fresh opinion.’
So far we have described three different types of replicas, true duplicates, replicas compiled from the same source, and replicas instantiated from the same factory. Let us now consider independent versions.
If three web services or grid services were to be developed independently, it would make sense to implement these web services in different programming languages. For example, if all three web services were written in Java, then they would inevitably share some libraries, even if they come from different development team. The 3 most popular languages for creating web services are Java, C#, and Perl. The web service libraries for these languages come from separate development efforts and have substantially different implantations.
Web services are usually a wrapper around an existing program. Grid services are usually a wrapper around a job manager which can execute an existing program. Either way, service independence does not guarantee that the services are independent at the back-end.
Measuring service diversity and dependence of failures is also an issue. The services that comprise a fault tolerant system should be as diverse as possible. [Mcallister&Vouk] suggest that we can quantify diversity in different versions by comparing the results from a set of random inputs. The idea is to use a large enough sample so as to distinguish the degree of failure overlap. This class of experiments can be extended to web services without modification.
In service orchestration, the problem of measuring downstream dependency may prove significant when trying to create fault tolerant systems. If two services are both dependant on a third service, we cannot consider these web services to be alternates versions. However, there is no currently standard way to check. If a service advertised its position in the workflow, by perhaps publishing the workflow document, then all endpoints are visible and a downstream dependency check could occur. The workflow description document URL could be published by its component services or a registry.
Of course, the costs of creating independent versions are significantly greater than creating replicas. As the previous chapter demonstrated, much can be accomplished with service replicas.
Recovery Blocks
Figure X: Recovery Blocks, [Mcallister&Vouk]
One of the most common fault-tolerant software schemes is the recovery block. The output of a module, or in this case a service, is tested for acceptability with an acceptance test. If the acceptance test determines that the output is not acceptable, it rolls back to a state before the service was executed. It then asks the second service to execute, and so on. Every SOAP-level acceptance test
The problem with recovery block is choosing what acceptance test to use. An extreme example may be to use an independent version as the acceptance test. The most straightforward acceptance test is probably to the Validate SOAP Body test describe in the previous section. This is a strong acceptance test because it guarantees that the program will be able to continue executing, however it does nothing to check the correctness of data in the output document.
N-version programming
Figure X: N-version programming, [Mcallister&Vouk]
N-version programming proposes parallel execution of N alternate versions. A module must then aggregate the N results and perform adjudication of their outputs. Part of this adjudication module can be a voter. Each version can have a weighted vote or equal vote.
Consensus voting is a generalization of majority voting. It uses the following algorithm to select which answer to use:
If there is a majority agreement (m >= ┌ (N+1)/2 ┐, N > 1), then this answer is choose as the correct answer.
Otherwise, if there is a unique maximum agreement, but this number of agreeing versions is less then ┌ (N+1)/2 ┐, then this answer chosen is the correct one.
Otherwise, if there is a tie in the maximum agreement number from several output groups, then
If consensus voting is used in N-version programming, one group is chosen at random and the answer associated with this group is chosen as the correct one.
Else if consensus voting is used in consensus recovery block, all groups are subjected to the acceptance test, which is then used to choose the correct output.
Figure X: Consensus voting, [Mcallister&Vouk]
N-version programming has several synergies with to web service systems. Obviously, each web service can correspond to a version. Another web service may be an adjudication mechanism.
Intermediary Web Services
Several groups [Annraí][Bacigalupo][Chi&Wu] have done work on intermediate web services. Unfortunately, most intermediary web services, today, are point solutions to very narrow problems. Consensus voting web services hold the potential to dramatically reduce the costs of adding fault-tolerance to a web service system. All the acceptance tests listed in figure X can be performance on any pair (or more) of web services without any regard to the domain of the application. A general purpose web service using schema validation as the acceptance test would allow for redundancy or voting models, so long as there are multiples versions.
However, one could provide these multiple versions from independent servers or they may all be hosted on the same server. There are several mechanisms to address this.
The address of the endpoint URL could be used to uniquely identify. Albeit imperfect, Endpoints names or IP addresses may prove useful.
The consumer of the result of a remote adjudication system must be able to trust the result. A lack of trust in makes these services somewhat impractical for commercial products, but they may first adopted for scientific research. Consider an adjudication web service for Bioinformatics. This could be possible under a Molecular Biology Data Model published under the NCBI, if the service could reliably retrieve NCBI’s published Data Model Schema documents.
The costs of creating a custom voting system are significantly higher than consuming a web service. However, the main barriers to deploying such services are the issues of trust and ironically the lack of guarantee that the adjudication service will be available.
Monday, April 05, 2004
3. Background: Service Registries
In this chapter, I provide background on the three most widely known web service registry technologies. In a web service workflow system, a registry’s primary function is to provide a mechanism for service discovery. Service discovery may enable improvements in the workflow systems such as fault-tolerance, planning, or improved performance. Since the inception of web services, many people believed that service discovery would become an essential part of web service technology. However, as programmers have used web services, most have left service discovery to the user (as parameters) or even hard-coding the service endpoint. This is especially true for scientific systems. Since there are usually a limited number of services that work with a scientific workflow system, the burden of discovering services is easily pushed to the user.
Academic projects have a RYO approach to service registries, and several existing scientific workflow systems have used registries. Projects such as DiscoveryNet [discoverynet], and ICENI [iceni] use custom registries. The Self-Serv [benatallah03] and Triana [Triana], projects use UDDI. This chapter of 3 types of open source registry technologies. Commercial web service registries, most notably Microsoft DISCO, are considered outside of the scope.
3.1 UDDI Overview
Universal Description, Discovery, and Integration (UDDI) version 1.0 was create amongst the dot-com and e-commerce technologies in 2000. It was originally conceived as a machine-readable “Universal Business Registry,” an ecommerce directory that would revolutionize supply chain. The version 1.0 specification was supposes to create a standard platform on which business could compete to offer services. The UDDI specification was then substantially revised in version 2.0 in 2001. The specification was tied to XML, WSDL, XDS, and other overlapping projects. In 2004, the current version of UDDI (version 3.0) is a conglomeration of the initial web services specification (WSDL, SOAP, XSD) and the emerging technologies of XML-security and service publication. The UDDI project is currently negotiation with W3C and OASIS to turn the specification over to a standards body working group. UDDI’s authors have claimed [STENCILUDDI] that the standard is evolving to become more practical.
UDDI software offers a powerful, secure, and feature-rich web service registry at the cost of complexity and uncertainty about future versions of UDDI. UDDI has not yet been widely adapted and used in academia or in industry, but has been used successfully in some business projects. Many major corporations as well as numerous startups have been involved in the evolution of UDDI. The most influential actors IBM, Microsoft, Sun, and SAP have all published UDDI implementations.
Independent research carried out by SalCentral shows that over 67% of entries in public UDDI registries today are either invalidly formatted or validly formatted but unavailable. This is due to inadequate quality of service guarantees and a lack of moderation in these registries. Furthermore, the available services are underutilized and very rarely “discovered” programmatically. In order to mitigate these problems, UDDI version 3.0 relies on a publish/subscribe model.
One interacts with a UDDI server or UDDI enabled application though a number of APIs, and their respective implementations. Today there are 6 major implementations of UDDI 2.0, however only Systinet claims to support version 3.0 features.
API Implementations
• IBM WSTK and UDDI4J
• Systinet WASP UDDI
• jUDDI.org
• UDDI 2.0 in Java
• Microsoft UDDI SDK
• Trenian Web Services Directory
While the features and implementation of each implementation are different, they all provide mechanism to publish, find, and bind services. Publishing services is simply the matter of including a service in a UDDI registry. This can be done through a series or RPC style web service calls or by a person through a web page interface. The specification provides mechanism for assigning a unique identifier (UDDI Key) to a newly published service. Finding or discovering a service is accomplished through creating a client-side proxy and searching by business, service, or description. Binding consists of how an application connects to, and interacts with, a web service after it's been found.
The UDDI is capable of storing the following data elements:
• businessEntity: Describes a business or other organization that typically provides Web services.
• businessService: Describes a collection of related Web services offered by an organization described by a businessEntity.
• bindingTemplate: Describes the technical information necessary to use a particular Web service.
• tModel: Describes a “technical model” representing a reusable concept, such as a Web service type, a protocol used by Web services, or a category system.
The most flexible component is the tModel, that can be any data structure available in XML schema (XSD). WSDL documents, meta-data, and human readable descriptions, can all be encoded as a tModel component. A tModel can contain any data you want.
3.2 WSIL Overview
Web Services Inspection Language is project created by Microsoft and IBM that offers a lightweight, decentralized service registry in contrast to the complex centralized approach of UDDI. WSIL a specification for an XML-based meta-language and is used by the creation and consumption of web-based XML documents [APPNEL02].
A WSIL document is simple a way for an organization to aggregate and advertise its web services. The WSIL specification says:
"WSIL defines how a service requestor can discover an XML Web Service description on a Web server, enabling such requestors to easily browse Web servers for XML Web Services."
WSIL encourages organizations to locate their WSIL documents in a uniform manner.
The WSIL document is published at http://example.org/inspection.wsil or http://examples.org/services/inspection.wsil.
All WSIL documents must have a root element,, which wraps all the service advertisements. Each service is wrapped in a tag, which contains a tag. Usually, the element provides a reference to the namespace and the WSDL.
WSIL documents incorporate XML schema so that the WSIL specification is designed to be extensible with other definition types. WSDL support in WSIL is achieved through the use of extensible XSD elements. WSDL and UDDI extensions are pre-built and implemented in the most widely-used WSIL toolkit, the Apache Axis WSIL4J project [WSIL4J].
Academic researchers have definitely noticed WSIL, and have mentioned or discussed it in many documents, most prominently by the UK e-science project, Indiana University, and University of Chicago.
XMethods.com, a directory of publicly-available Web services, is an earlier adopter of WSIL and has developed a binding extension for its service.
In this chapter, I provide background on the three most widely known web service registry technologies. In a web service workflow system, a registry’s primary function is to provide a mechanism for service discovery. Service discovery may enable improvements in the workflow systems such as fault-tolerance, planning, or improved performance. Since the inception of web services, many people believed that service discovery would become an essential part of web service technology. However, as programmers have used web services, most have left service discovery to the user (as parameters) or even hard-coding the service endpoint. This is especially true for scientific systems. Since there are usually a limited number of services that work with a scientific workflow system, the burden of discovering services is easily pushed to the user.
Academic projects have a RYO approach to service registries, and several existing scientific workflow systems have used registries. Projects such as DiscoveryNet [discoverynet], and ICENI [iceni] use custom registries. The Self-Serv [benatallah03] and Triana [Triana], projects use UDDI. This chapter of 3 types of open source registry technologies. Commercial web service registries, most notably Microsoft DISCO, are considered outside of the scope.
3.1 UDDI Overview
Universal Description, Discovery, and Integration (UDDI) version 1.0 was create amongst the dot-com and e-commerce technologies in 2000. It was originally conceived as a machine-readable “Universal Business Registry,” an ecommerce directory that would revolutionize supply chain. The version 1.0 specification was supposes to create a standard platform on which business could compete to offer services. The UDDI specification was then substantially revised in version 2.0 in 2001. The specification was tied to XML, WSDL, XDS, and other overlapping projects. In 2004, the current version of UDDI (version 3.0) is a conglomeration of the initial web services specification (WSDL, SOAP, XSD) and the emerging technologies of XML-security and service publication. The UDDI project is currently negotiation with W3C and OASIS to turn the specification over to a standards body working group. UDDI’s authors have claimed [STENCILUDDI] that the standard is evolving to become more practical.
UDDI software offers a powerful, secure, and feature-rich web service registry at the cost of complexity and uncertainty about future versions of UDDI. UDDI has not yet been widely adapted and used in academia or in industry, but has been used successfully in some business projects. Many major corporations as well as numerous startups have been involved in the evolution of UDDI. The most influential actors IBM, Microsoft, Sun, and SAP have all published UDDI implementations.
Independent research carried out by SalCentral shows that over 67% of entries in public UDDI registries today are either invalidly formatted or validly formatted but unavailable. This is due to inadequate quality of service guarantees and a lack of moderation in these registries. Furthermore, the available services are underutilized and very rarely “discovered” programmatically. In order to mitigate these problems, UDDI version 3.0 relies on a publish/subscribe model.
One interacts with a UDDI server or UDDI enabled application though a number of APIs, and their respective implementations. Today there are 6 major implementations of UDDI 2.0, however only Systinet claims to support version 3.0 features.
API Implementations
• IBM WSTK and UDDI4J
• Systinet WASP UDDI
• jUDDI.org
• UDDI 2.0 in Java
• Microsoft UDDI SDK
• Trenian Web Services Directory
While the features and implementation of each implementation are different, they all provide mechanism to publish, find, and bind services. Publishing services is simply the matter of including a service in a UDDI registry. This can be done through a series or RPC style web service calls or by a person through a web page interface. The specification provides mechanism for assigning a unique identifier (UDDI Key) to a newly published service. Finding or discovering a service is accomplished through creating a client-side proxy and searching by business, service, or description. Binding consists of how an application connects to, and interacts with, a web service after it's been found.
The UDDI is capable of storing the following data elements:
• businessEntity: Describes a business or other organization that typically provides Web services.
• businessService: Describes a collection of related Web services offered by an organization described by a businessEntity.
• bindingTemplate: Describes the technical information necessary to use a particular Web service.
• tModel: Describes a “technical model” representing a reusable concept, such as a Web service type, a protocol used by Web services, or a category system.
The most flexible component is the tModel, that can be any data structure available in XML schema (XSD). WSDL documents, meta-data, and human readable descriptions, can all be encoded as a tModel component. A tModel can contain any data you want.
3.2 WSIL Overview
Web Services Inspection Language is project created by Microsoft and IBM that offers a lightweight, decentralized service registry in contrast to the complex centralized approach of UDDI. WSIL a specification for an XML-based meta-language and is used by the creation and consumption of web-based XML documents [APPNEL02].
A WSIL document is simple a way for an organization to aggregate and advertise its web services. The WSIL specification says:
"WSIL defines how a service requestor can discover an XML Web Service description on a Web server, enabling such requestors to easily browse Web servers for XML Web Services."
WSIL encourages organizations to locate their WSIL documents in a uniform manner.
The WSIL document is published at http://example.org/inspection.wsil or http://examples.org/services/inspection.wsil.
All WSIL documents must have a root element,
WSIL documents incorporate XML schema so that the WSIL specification is designed to be extensible with other definition types. WSDL support in WSIL is achieved through the use of extensible XSD elements. WSDL and UDDI extensions are pre-built and implemented in the most widely-used WSIL toolkit, the Apache Axis WSIL4J project [WSIL4J].
Academic researchers have definitely noticed WSIL, and have mentioned or discussed it in many documents, most prominently by the UK e-science project, Indiana University, and University of Chicago.
XMethods.com, a directory of publicly-available Web services, is an earlier adopter of WSIL and has developed a binding extension for its service.