clustering wsdl documents to bootstrap the discovery of web services
DESCRIPTION
Reading ICWS2010 "Clustering WSDL Documents to Bootstrap the Discovery of Web Services"TRANSCRIPT
![Page 1: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/1.jpg)
Clustering WSDL Documents to Bootstrap the Discovery of Web Services
Web Services Discovery
![Page 2: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/2.jpg)
Outline
Introduction Related Work Our Approach Experiments Conclusion
![Page 3: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/3.jpg)
Introduction
Major providers decided to publish WS through their own websites instead of public registries
UDDI Busine
ss Registr
y
Search
engine
47%92%
![Page 4: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/4.jpg)
Introduction
Problem of search engine If the search query doesn’t contain part
of the service name exactly, the service may not be retrieved
User may even miss services that use synonyms or variations of keywords car -> vehicle
![Page 5: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/5.jpg)
Outline
IntroductionRelated Work Our Approach Experiments Conclusion
![Page 6: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/6.jpg)
Related work
Using the Jaccard coefficient to calculate the similarity between Web services. (Richi Nayak 2008) provides the user with related search terms based on
other users’ experiences with similar queries Web services search engine Woogle (Xin Dong 2004)
that is capable of providing Web services similarity search. Does not adequately consider data types
Apply text mining techniques to extract features such as service content, context, host name, and name, from Web service description files in order to cluster Web services(Wei Liu 2009) service context and service host name features offer little
help in the clustering process
![Page 7: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/7.jpg)
Outline
Introduction Related WorkOur Approach Experiments Conclusion
![Page 8: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/8.jpg)
Big picture
![Page 9: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/9.jpg)
Features Extraction
Mine the WSDL documents to extract features that describe the semantic and behavior of the Web service WSDL content WSDL types WSDL messages WSDL ports Web service name
![Page 10: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/10.jpg)
Features Extraction Process
![Page 11: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/11.jpg)
Feature 1: WSDL Content
Ti={types, message, weather, zipcode, web, forecast,
forecasting, is..}
Ti={weather, zipcode, web, forecast,
forecasting, is…}
Ti={weather, zipcode, web, forecast, is…}
Ti={weather, zipcode, web, forecast…}
Ti={weather, zipcode, forecast..}
Parsing WSDL
Tag removal
Word stemming
Function word
removal
Content word
recognition
![Page 12: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/12.jpg)
Function word removal
Function word: is, a, do.. Content word: weather, zipcode..
![Page 13: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/13.jpg)
Content word recognition
Apply k-means clustering algorithm with k=2 on Ti
use Normalized Google Distance (NGD) as a featureless distance measure between words
{weather, zip,
zipcode, forecast, place}
{response, bind,
data, post, port,
target}
{runtime, bind, web,
service, module,
data, post}
Web service specific cluster Predefined cluster
Non-Web-service-specific
cluster
![Page 14: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/14.jpg)
WSDL types, messages, ports
![Page 15: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/15.jpg)
Feature 2,3,4
Feature 2: WSDL Types (complexType)
the type attribute is a good candidate for describing the functionality of a service.
Feature 3: WSDL Messages Feature 4: WSDL Ports
![Page 16: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/16.jpg)
Feature 5: Web Service Name
We consider the Web service name used in the URI of the WSDL document
http://www.webservicex.net/WeatherForecast.Asmx?WSDL
the name of the Web service is ”Weather Forecast”
![Page 17: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/17.jpg)
Features Integration and clustering
We use the Quality Threshold (QT) clustering algorithm to cluster similar Web services based on the five similarity features presented above.
Similarity factor between web service si and sj
![Page 18: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/18.jpg)
Outline
Introduction Related Work Our ApproachExperiments Conclusion
![Page 19: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/19.jpg)
Experiments
Two criteria Precision: exactness Recall: completeness
![Page 20: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/20.jpg)
Experiments
400 online web services Manual classification, serve as a
comparison point for clustering algorithms ”Currency exchange”, ”Weather”,
”Address validation”, ”E-mail verification”, and ”Credit card services”
![Page 21: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/21.jpg)
Results
High Precision and Recall
![Page 22: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/22.jpg)
Outline
Introduction Related Work Our Approach ExperimentsConclusion
![Page 23: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/23.jpg)
Conclusion
We propose an approach to improve service discovery of non-semantic Web services by clustering similar services through mining WSDL documents
Future work: plan to improve features integration by choosing optimized weights for each feature using a linear programming approach
![Page 24: Clustering WSDL Documents to Bootstrap the Discovery of Web Services](https://reader033.vdocuments.net/reader033/viewer/2022061118/546278e8af7959b92a8b5fbd/html5/thumbnails/24.jpg)
Thanks