tycho: a resource discovery and messaging framework for distributed applications matthew grove...

24
Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove [email protected] Viva Presentation, November 2006

Upload: marshall-hudson

Post on 20-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

Tycho: A Resource Discovery and Messaging Framework for Distributed Applications

Matthew Grove [email protected]

Viva Presentation, November 2006

Page 2: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

2

Outline

•Research Goals,

•An Overview Of Tycho,

•Comparative Benchmarks,

•Applications of Tycho,

•Tycho Swarm, a Distribution File Utility - (Demo),

•Summary.

Page 3: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

3

Some Background

• Two key services for distributed systems are a mechanism for discovering remote components (such as a registry) and then sending messages between these components:

–These two services are interdependent.

• Current solutions require the application scientists to assemble their systems from a diverse range of services.

• One approach has been to produce toolkits which have pre-selected sets of service bundled together, for example Globus.

Page 4: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

4

Research Goals• The thesis of this research work is that by combining

registry and messaging into a single software framework, the task of binding together distributed systems can be simplified.

• The proposed solution uses an Internet-based architecture that keeps complexity at the edges of a robust and secure set of core services - a novel approach!

• This framework facilitates extensibility while limiting the installation and management costs of using the software.

• The design and development of the framework - known as Tycho - has an overarching goal of reducing the complexity of developing distributed applications.

Page 5: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

5

High-level RequirementsThese are the desirable features for Tycho - as argued in the

dissertation:– Scalability, be able to cope with the sizes typical of modern

distributed systems,– High-performance,– Extensibility, be able to add new features and interoperate with

other systems,– Security out of the box,– Manageability, ease of installation and use:

• For example minimizing elememnts like software dependencies, firewall requirements and the amount of configuration needed to deploy Tycho.

Page 6: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

6

The Tycho Implementation• Tycho is the reference implementation of the framework

developed during the PhD:• The Tycho components are:

–Mediators,–Clients (Producers and Consumers),–Utilities:

• The Tycho mediator provides services that allow clients to discover each other using a Virtual Registry (VR) made up of a network of mediators – this also aids communication over both LAN and WAN.

• Utilities are extensions to Tycho’s functionality.

• Tycho used to be called javaGMA or jGMA (poor choice of name!)

Page 7: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

7

Tycho’s Architecture

Page 8: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

8

General Design Philosophy• Reuse existing software components, if possible,

rather than reinvent existing services or functionality.

• Try to make use of existing software infrastructure.• Ensure that Tycho is simple to install, configure and

use.• Provide a ‘basic release’ with the ability to extend

functionality with a further more sophisticated component - Tycho utilities.

• Because we require portability and interoperability with other distributed systems, Java was a good choice of implementation language.

Page 9: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

9

Tycho Mediator Implementation

• Tycho provides a choice of implementations for each core service.• Tycho’s design described in a paper for a "Work-in-Progress

Novel Grid Technologies" track of the IEEE International Conference Cluster Computing and Grid 2005 (CCGrid 2005).

Page 10: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

10

Tycho Clients & Utilities

•The Tycho Connector provides the API for building producers and consumers.•Extra functionality can be added as utilities.

Page 11: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

11

An Example of Tycho’s Setup

Page 12: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

12

Tycho Benchmarks• Three rounds of benchmarking to measure the performance

of Tycho compared to state-of-the-art and widely used systems:

– Communications - measured the performance of inter-client and inter-mediator messaging for Tycho and NaradaBrokering.

– Virtual Registry tests - measured and compared the performance of the Tycho VR to Globus MDS4 and gLite R-GMA.

– Component Tests - different components of the VR were tested in various configurations.

• Results presented in a paper in proceedings of the IEEE International Conference on Cluster Computing 2006 (Cluster 2006).

Page 13: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

13

Sample VR Benchmark Results

MDS4 out of memory

Page 14: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

14

Benchmarks Results Summary• Tycho has a better performance and client-scalability than

both R-GMA, MDS4 and NaradaBrokering.• R-GMA, MDS4 and NaradaBrokering all crashed during

testing when they exceeded the maximum memory available for the tests (1.5 Gbytes).

• Memory management in Java systems is an issue:– Without limited buffering or flow control, consuming the Java

heap is a problem.

• Storing information internally using XML seems to be a source for some of these memory problems:

– Java database solutions such as HSQDLB can provide a high-performance solution for off-loading some of the storage requirements to disk.

Page 15: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

15

Tycho Core – Future Work

• Some more performance improvements:– Caching of local mediator queries to reduce

response times,– Use of a hybrid VR-interconnect to use IRC for

query routing and HTTP for transporting large responses.

• Additional functionality can be added to provide advanced services:

– WS-based transport handlers for interoperability.

Page 16: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

16

Tycho Applications• We developed a number of applications to further

validate the implementation. • These include:

–Demonstrations of publishing and discovering distributed webcams,

–Remote resource discovery for the VOTechBroker project:

• Part of the European Virtual Observatory project, Tycho provides automatic resource discovery for job submission.

–Binding components together for the Semantic Log Analyser (Slogger) project:

• Here Tycho helps locate and gather distributed logs for analysis.

Page 17: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

17

Content Distribution With Tycho• We wanted to develop a Tycho utility that would demonstrate and

validate the utility concept:– We wanted to create something useful!

• We created a content distribution system call the Tycho swarm utility.

• The swarm utility provides content distribution similar to BitTorrent and overcomes the common ‘2 Gigabyte file size problem’.

• Content is split into ‘chunks’ and the VR is used to store chunk availability.

• Peers use the VR to locate each other and decide what chunks to download.

• Tycho messages are used to transfer the chunks between peers and peers cooperate to distribute the content throughout the swarm.

Page 18: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

18

Swarm Utility Architecture

Page 19: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

19

Swarm Utility Summary.• The utility was developed to test the potential of Tycho

utilities and also further stress test the overall infrastructure:

–By simultaneously utilising the VR and messaging functionality,

–Storing and updating thousands of entry records in the VR,

–Sending thousands of multi-megabyte messages between clients.

• Its potential uses include:–Distributing files for collaboration purposes,–Staging data for computation,–Mirroring and managing large data sets.

Page 20: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

20

Swarm Utility Demo

Page 21: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

21

Summary

•The reference implementation of Tycho has been completed.

•Tycho has been released under the LGPL Open Source license:

– http://acet.rdg.ac.uk/projects/tycho/

•The focus now is on developing Tycho utilities to provide more feature rich functionally.

•This work has been summarised in a paper accepted for a special issue of The Journal of Supercomputing.

Page 22: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

22

Research Goals• Scalability and high-performance have been

demonstrated by the benchmarking.• Extensibility has been shown with the development

of the swarm utility and the different services and protocols supported by Tycho.

• Tycho has security ‘out of the box’, using HTTPS and passwords or certificates for wide-area access control and encryption - no comparable system we reviewed has this currently.

• Manageability has been maximised, Tycho requires one firewall port, has no external dependencies other than a JVM and can run with zero configuration.

Page 23: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

23

Some Experiences / Observations• Java developers should think carefully about how memory is

used in their applications.• Systems which store their data internally as XML will

probably have relatively poor performance and require large amounts of memory and resources to work.

• If you use a servlet container, Jetty offers much better performance than Apache Tomcat.

• Instead of using a separate database, consider the Java-based HSQLDB, we have shown it can achieve excellent performance and it removes an external dependency from your software.

• Java is not a magic bullet for portability, systems such as R-GMA are evidence of this.

Page 24: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications Matthew Grove m.grove@reading.ac.uk Viva Presentation, November 2006

24

•Project Web page:–http://acet.rdg.ac.uk/projects/tycho/

•The DSG Web page:–http://dsg.port.ac.uk/

•The ACET Web page:–http://acet.port.ac.uk/

Links