the project presentation - carnegie mellon universityece749/teams-06/team7/... · the project...

Post on 08-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Project PresentationThe Project PresentationApril 28, 2006April 28, 2006

1818--749: Fault749: Fault--Tolerant Distributed SystemsTolerant Distributed Systems

Team 7Team 7--SixersSixersKyu Hou Kyu Hou

Minho Jeung Minho Jeung Wangbong LeeWangbong Lee

HeejoonHeejoon Jung Jung WenWen ShuShu TangTang

2

MembersMembers

Kyu Houkyuh@andrew.cmu.edu

MSE

http://www.ece.cmu.edu/~ece749/teams-06/team7/

Min Ho Jeungmjeung@andrew.cmu.edu

MSE

Wangbong Leewangbonl@andrew.cmu.edu

MSE

Heejoon Jungwangbonl@andrew.cmu.edu

MSIT-SE

Wen Shu Tangwtang@andrew.cmu.edu

ECE

3

Baseline ApplicationBaseline ApplicationExpress Bus Ticket CenterExpress Bus Ticket Center

ApplicationApplicationOnline express bus ticketing application

ConfigurationConfiguration• Operating System: Linux servers• Programming Language: Java• Database: MySQL• Middleware: CORBA

Baseline Application FeatureBaseline Application Feature• Users can retrieve bus schedules and tickets.• Users can buy tickets. • Users can cancel the tickets.

4

Baseline Architecture (before)Baseline Architecture (before)

Database

Pool of Customers Bus Ticket System

③④

Allocation View-Deployment Style

Legend

InternetClient Server Database Request message Response messageRequest Query Response Query

5

Baseline Architecture (after)Baseline Architecture (after)

DatabasePool of Customers Bus Ticket System

③④

Allocation View-Deployment Style

Legend

Client Server Database Request Message Response MessageRequest Query Response Query

②User

InterfaceClient

CORBAInterface

MySQL

ServerCORBAInterface

JDBCBusinessLogic

6

Client requests should be preserved, when exception is occurred.Replication

There are 2 copies of server which perform same operations for fault-tolerance on the chess and risk machine.Replication Type

Active ReplicationAdvantage: PerformanceDisadvantage: More memory and processing cost

Replication ManagerNo specific replication manager exists.As soon as client application begins, the application acquires the replication server name which is stored in Naming Server.

Elements of Fault-Tolerance FrameworkGlobal Manager: HeartbeatRecovery Manager

Re-instantiating a failed replicationThe recovery result is written into a log file in Database.

Fault injector: Shell

FaultFault--Tolerance ApplicationTolerance Application

7

ScenarioScenario

1. Client requests the names of server to the naming server.1. Client requests the names of server to the naming server.2. The naming server sends the names of servers.2. The naming server sends the names of servers.3. Client requests to all servers.3. Client requests to all servers.

a. When the client receives an exception message, then the faua. When the client receives an exception message, then the fault is detected.lt is detected.b. The client already communicates with another replication seb. The client already communicates with another replication server.rver.

4. All servers send the results to clients.4. All servers send the results to clients.5. Client receives the results, and checks duplication.5. Client receives the results, and checks duplication.

FTFT--Baseline ArchitectureBaseline Architecture

Exception CasesException CasesServer_Timeout

Checked by using thread poolDatabase_Timeout:

Checked by using connection poolDead_Server

Solved by using heartbeat (check servers per 2 seconds)

Global Recovery Manager: Heartbeat

Mechanisms for FailMechanisms for Fail--OverOver

9

Performance measurementPerformance measurement

48 Configurations48 ConfigurationsBuy and cancel ticketBuy and cancel ticket

us

Performance measurementPerformance measurement

Performance measurementPerformance measurement

Performance measurementPerformance measurement

Performance measurementPerformance measurement

1 Client

1000 requests

Cancel ticket request

Fault Injection measurementsFault Injection measurements

Performance measurement comparisonPerformance measurement comparison

16

RTRT--FT Baseline ArchitectureFT Baseline ArchitectureActive ReplicationActive Replication

17

Thread PoolWe need to avoid the overhead of thread creation for each request. Create a number of threads at initialize time

Dynamic configuration

Without Thread PoolAVG RTT: 40.5 msec

With Thread PoolAVR RTT: 38.0 msec

Improve the performance about 4%

RTRT--FT Performance Strategy FT Performance Strategy

18

Thread Pool / No Thread Pool

RTRT--FT Performance Measurement FT Performance Measurement

19

List other featuresFault Injector – Shell ScriptLog4j – Logging informationApach DB Connection Pool (DBCP)

What lessons by other features?Useful utilitiesImprove performance by DBCPPowerful shell scripts

Other FeatureOther Feature

20

FT MeasurementFile I/O for logging time grows as the a file size increases

RT-FT MeasurementNo RTT difference between fault-free and fault-injected test cases

Duplicated values reach the client almost at the same time.

RT-FT Performance MeasurementThread creation time is not trivial when the number of replica increase

Need more test cases

Insights from MeasurementInsights from Measurement

21

Issues

Test environment How to set up same test environment for each test case.

How to decide test environment is good enough to get the meaningful data.

Additional features

Load balancing for active replicationOrganizing active replication group

Passive replication for each group

Open IssueOpen Issue

22

What did we learn?

Handling thread

Data gathering and analyzing

Useful open source programApache project :log4j, dbcp

What did we accomplish?

succeed to build active replication system

If we could start our project again,

focus on only FT features

ConclusionConclusion

top related