auspice: automatic service planning in cloud/grid environments david chiu dissertation defense may...

85
Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof. Hakan Ferhatosmanoglu Prof. Christopher Stewart

Upload: carmel-brown

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

Auspice: AUtomatic Service Planning in Cloud/Grid

Environments

David ChiuDissertation Defense

May 25, 2010

Committee:Prof. Gagan Agrawal, AdvisorProf. Hakan FerhatosmanogluProf. Christopher Stewart

Page 2: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

2

Explosion of Scientific Data Sources

• The amount of scientific data has increased dramatically over the years

• In just one example,‣ Large Hadron Collider (LHC)‣ 15 petabytes annually‣ 60 petabytes overall

• Management and processing have become challenging

Page 3: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

3

Data Sources

A Live Cyber Infrastructure

Page 4: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

4

Computing & Storage Resources

A Live Cyber Infrastructure

Page 5: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

5

Shared/Proprietary Web Services

= Web Service

A Live Cyber Infrastructure

Page 6: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

6

. . .

A Live Cyber Infrastructure

Page 7: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

7

Service Interaction with Cyber Infrastructure

. . .

invoke

results

Page 8: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

8

Current GUI for Creating Workflows

Page 9: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

9

Scientific Workflow Challenges

???‣ Difficulties for the scientist:

‣ How to identify which data sets to use, and from where to get them?

‣ Which services are available to me to use?

‣ What resources to utilize?

‣ How can I accelerate workflow execution?

‣ Do I really have to do all this myself?

Page 10: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

10

Contributions

• Workflow System-- with the following support

• High-level scientific user querying‣ D. Chiu and G. Agrawal. A Keyword Querying Interface for Invoking

Scientific Workflows. (OSU-TR, submitting to ACM-GIS’10)

‣ D. Chiu and G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets. (SSDBM'09)

• Automatic workflow planning‣ D. Chiu and G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific

Data Sets. (SSDBM'09)

‣ D. Chiu and G. Agrawal. Ad Hoc Scientific Workflows through Data-driven Service Composition. (eScience'07)

Page 11: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

11

Contributions (continued)

• Quality of Service‣ D. Chiu, S. Deshpande, G. Agrawal, and R. Li. A Dynamic Approach toward

QoS-Aware Service Workflow Composition. (ICWS’09)

‣ D. Chiu, S. Deshpande, G. Agrawal, and R. Li. Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid Environments. (GRID'08)

‣ D. Chiu, S. Deshpande, G. Agrawal, and R. Li. Composing Geoinformatics Workflows with User Preferences. (GIS’08)

• Accelerating Workflow Execution‣ D. Chiu and G. Agrawal. Evaluating Caching and Storage Options on the

Amazon Web Service Cloud. (OSU-TR, submitted to GRID’10)

‣ D. Chiu, A. Shetty, and G. Agrawal. Elastic Cloud Caches for Derived Data Reuse. (OSU-TR, submitted to SC’10)

‣ D. Chiu and G. Agrawal. Hierarchical Caches for Grid Workflows. (CCGrid’09)

Page 12: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

12

Presentation Outline• Motivation & Introduction

• Our Service Composition System: Auspice‣ Metadata Framework‣ Cost-Aware Service Planning‣ Supporting Keyword Queries‣ Elastic Cache Deployment

• Conclusion

Auspice

Page 13: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

13

Auspice System

Page 14: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

14

Auspice System

D. Chiu & G. Agrawal, eScience ’07

D. Chiu & G. Agrawal, SSDBM ’09

Page 15: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

15

What known data or services can derive a coast line?

Systematic Way to Plan Workflows?

• Goal-Driven, Recursive Concept Derivation• Example User Goal: Coastline Extraction

Coast

Line

We are targeting some coastline concept in the geospatial domain

Page 16: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

16

What known data or services can derive water

level?

Available Services Available Data

What known data or services can derive a

CTM?

Available Services Available Data

Coast

Line

Coast Extrac

t1

Coast Data

1

Coast Data

N

Available Services Available Data Types

What are its parameters?

Systematic Way to Plan Workflows?

Coast Extrac

tK

Water

Level

CTM

Page 17: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

17

Coast

Line

Systematic Way to Plan Workflows?

Coast Extrac

tK

Water

Level

CTM

.

.

.

.

..

..

....

Coast Extrac

t1

Coast Data

1

Coast Data

N

Page 18: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

18

Coast

Line

Systematic Way to Plan Workflows?

.

.

.

.

..

..

....

Workflow 1 Workflow 2

Workflow 3

...

Page 19: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

19

Ontology for Applying Domain Information

Domain concepts can be derivedfrom executing a service

Domain concepts can be derived from retrieving an existing dataService parameters can be

represented by certaindomain concepts

Page 20: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

21

Auspice Metadata Registration

• Given a data set or service,

‣ Ontology is applied to new resources

‣ Resources are indexed and immediately usable in workflow planner

‣ Non-intrusive

Page 21: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

22

Registering Data Sets

Page 22: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

23

Registering Services

Page 23: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

24

Subset of Ontology, with Shoreline Target

Page 24: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

25

Service Planning: An Example

A Derived Execution Plan for shoreline concept

Page 25: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

26

What Users Want

Do what you can to provide me results in under 20 minutes.

I want the fastest results with at least 75% accuracy

- Exec time prediction,- Online data reduction

- Domain-specific error modeling

....

..

..

....

Page 26: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

27

Presentation Outline• Motivation & Introduction

• Our Service Composition System: Auspice‣ Metadata Framework‣ Cost-Aware Service Planning‣ Supporting Keyword Queries‣ Elastic Cache Deployment

• Conclusion

Auspice

Page 27: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

28

Auspice System

Page 28: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

29

Auspice System

D. Chiu, S. Deshpande, G. Agrawal, & R. Li, GRID ’08

D. Chiu, S. Deshpande, G. Agrawal, & R. Li, ACM-GIS ’08

D. Chiu, S. Deshpande, G. Agrawal, & R. Li, ICWS ’09

Page 29: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

30

Challenges

• We wish to project workflow execution time and workflow accuracy costs at planning time

• Allow input models per service

• We should prune all workflows unlikely to meet the user’s demands

Page 30: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

31

Estimating Workflow Execution Time

• Service execution time (tx)‣ Each service is trained beforehand with various sized inputs

• Data output size (dsize)‣ Known for files. But models are again trained for service output

• Network transmission time (tnet)‣ Bandwidth between nodes are typically known

• Recall the workflow structure:

Page 31: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

32

Estimating Workflow Error/Accuracy

• The recursive sum is similar for error propagation

• The errors, , attributed from services and data are implemented by domain scientists

• is an accuracy parameter, e.g., sampling rate, resolution, ..

Page 32: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

33

Cost Models Declared per Operation

Page 33: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

34

Water Level Workflow ExampleWorkflow Plan 1 Workflow Plan 2

[t_total=3.5001 t_x=1 t_d=0 o=47889 e=0.004]SRVC.getWL( X=482593 Y=4628522 StnID= [t_total=2.5 t_x=0.5 t_d=0 o=0 e=0.004] SRVC.getKNearestStations( Longitude=482593 Latitude=4628522 ListOfStations= [t_total=2 t_x=2 t_d=0 o=47889 e=0] SRVC.GetGSListGreatLakes() RadiusKM=100 K=3 ) time=00:06 date=01/30/2008)

[t_total=2 t_x=2 t_d=0 o=47889 e=2.4997]SRVC.getWLfromModel( X=482593 Y=4628522 time=00:06 date=01/30/2008)

Total Projected Costs:Workflow Execution Time = 3.251Workflow Error = 0.004

Total Projected Costs:Workflow Execution Time = 1.674Workflow Error = 2.4997

Page 34: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

38

Cost Model Overheads

Page 35: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

39

Experimented Workflow• Shoreline Extraction

• Users can specify the following QoS Parameters:• Allowed execution time• Allowed error

Page 36: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

40

On Meeting Time Constraints

Page 37: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

41

On Meeting Error Constraints

Page 38: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

42

Presentation Outline• Motivation & Introduction

• Our Service Composition System: Auspice‣ Metadata Framework‣ Cost-Aware Service Planning‣ Supporting Keyword Queries‣ Elastic Cache Deployment

• Conclusion

Auspice

Page 39: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

43

Current GUI for Creating Workflows

Page 40: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

44

Auspice System

Page 41: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

45

Auspice System

D. Chiu & G. Agrawal, SSDBM’09

D. Chiu & G. Agrawal, (submitting to GIS’10)

Page 42: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

46

Supporting Keyword Querying

• Planning workflows is hard, while keyword search has become an extremely popular interface for information retrieval‣ No need to know underlying structure of data‣ No need to understand structured query languages like SQL

• Goal: Given set of key terms in the scientific domain, return ranked list of workflow plans to the user for execution

Page 43: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

47

Keyword Decomposition

coast CTM7/8/2003(41.30, -82.4)“ ”line

Filterstopping/stemming/pattern-match

map

Page 44: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

48

Keyword Maximization

coast

CTM7/8/2003

41.30

line

C

C

C

longitude

C

C

date

-82.4

C

latitude

D

D

D

Data-SubstantiatedConcepts

UnsubstantiatedConcepts

Any combination of these is potentially

what the query is targeting!

Potential queryparameters

Page 45: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

49

Keyword Querying

coast

CTM

line

C C

C

Merged SuperConcept

Query Target Candidate Requisite Concepts

7/8/2003

41.30

C

longitude

C

date

-82.4

C

latitude

D

D

D

Query Parameters

Page 46: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

50

Keyword Querying

coast CTM line

C CC

Merged SuperConcept

Query Target Candidate Requisite Concepts

7/8/2003

41.30

C

longitude

C

date

-82.4

C

latitude

D

D

D

Query Parameters

Page 47: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

51

Keyword Querying

coast CTMline

C C C

Merged SuperConcept

Query Target Candidate Requisite Concepts

7/8/2003

41.30

C

longitude

C

date

-82.4

C

latitude

D

D

D

Query Parameters

Enumerate Workflows

Page 48: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

52

Ranking Workflow Plans by Relevance

• Method: ‣ Let be the set of input keyword-concepts‣ Rank workflow plans on

Page 49: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

53

A Case Study

• The following keyword queries were submitted to Auspice

Page 50: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

54

Search Time

Page 51: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

55

Precision

Page 52: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

57

Presentation Outline• Motivation & Introduction

• Our Service Composition System: Auspice‣ Metadata Framework‣ Cost-Aware Service Planning‣ Supporting Keyword Queries‣ Elastic Cache Deployment

• Conclusion

Auspice

Page 53: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

58

Problem: Query Intensive Circumstances

. . .

. . .

. . .

Page 54: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

59

Caching Intermediate Results• Shoreline Extraction

Time consuming!

Can’t we cache the result from when it was last computed??

Page 55: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

61

Auspice System

Page 56: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

62

Auspice System

D. Chiu & G. Agrawal, CCGrid’09

D. Chiu, A. Shetty, & G. Agrawal, (submitted to SC’10)

D. Chiu & G. Agrawal, (submitted to GRID’10)

Page 57: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

63

Cloud Computing

•Pay as you go computing

•Elasticity‣ Cloud applications can

stretch and relax their resource requirements

•“Infinite” compute and storage resources

Page 58: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

64

A Workflow Cache

Compute Cloud

. . .

A B

Page 59: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

65

. . .

A

B

75

25

8

Consistent Hashing

Page 60: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

66

. . .

A

B

75

25

8

invoke:

service(35)

(35 mod 100) = 35Which proxy has the page?h(k) = (k mod 100)

h(35)

Consistent Hashing

Page 61: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

67

A

B

75

25

8

50 C

Only records hashing into

(25,50] need to be moved from

A to C!

Our algorithm for Scaling upGBA: Greedy Bucket Allocation

Page 62: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

68

Experimental Configuration

• Workload‣ Shoreline Extraction Workflow‣ Takes 23 seconds to

complete without benefits of cache

‣ Executed on a miss

• Amazon EC2 Cloud‣ Each Cloud node:

- Small Instances (Single core 1.2Ghz, 1.7GB, 32bits)- Ubuntu Linux

‣ Caches start out cold‣ Cache stored in memory only

Page 63: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

69

Experimental Configuration

• Our approach exploits a dynamic Cloud environment:‣ Consistent Hashing: Greedy Bucket Allocation (GBA)‣ Elastic number of nodes

• We compare GBA against statically allocated Cloud environments:‣ 2 fixed nodes (static-2)‣ 4 fixed nodes (static-4)‣ 8 fixed nodes (static-8)‣ Cache overflow --> LRU eviction

Page 64: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

70

Relative Speedup

Querying Rate: 255 invocations/sec

Cost Savings

Page 65: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

72

That’s Not Completely Elastic

• What about relaxing the amount of nodes to help save Cloud save costs?

• First, we need an eviction scheme

Page 66: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

73

Exponential Decay Eviction

• At eviction time:‣ A value, , is calculated for each data record in the

evicted slice‣ is higher:

- if was accessed more recently- if was accessed frequently

‣ If is lower than some fixed threshold, evict

Page 67: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

74

Experimental Configuration• Amazon EC2 Cloud

‣ Each Cloud node:- Small Instances (Single core 1.2Ghz, 1.7GB, 32bits)- Ubuntu Linux

‣ Caches start out cold‣ Data stored in memory‣ When 2 nodes become < 30% capacity, merge

• Sliding Window Configuration:‣ Time Slice: 1 sec‣ Size: 100 Time Slices

Page 68: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

75

Data Eviction: 50/255/50 queries per sec

Sliding Window Size = 100 sec

50 q/sec 255 q/sec 50 q/sec

Page 69: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

76

Cache Contraction: 50/255/50 queries per sec

Page 70: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

77

Cache Contraction: 50/255/50 queries per sec

Page 71: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

80

Presentation Outline• Motivation & Introduction

• Our Service Composition System: Auspice‣ Metadata Framework‣ Cost-Aware Service Planning‣ Supporting Keyword Queries‣ Elastic Cache Deployment

• Conclusion

Auspice

Page 72: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

81

Future Work

• Dynamic sliding window size

• Evaluate and model various Cloud infrastructure options to optimize cost for sustaining the cache

• Transparent remote data analysis over Clouds

• Deep Web Integration into querying framework

Page 73: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

82

Summary and Conclusion

• Auspice is a workflow system, which‣ Supports high-level keyword/NLP user queries‣ Automatically composes workflows, and adapts to QoS

Constraints‣ Caches workflow results to accelerate workflow execution

• Questions?

Auspice

Page 74: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

83

Capturing Concept Derivability

Domain concepts can be derivedfrom executing a service

Domain concepts can be derived from retrieving an existing dataService parameters represent

different domain concepts

Page 75: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

84

Indexing Data Sets

Page 76: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

85

Applying Domain Information

Domain concepts can be derivedfrom executing a service

Domain concepts can be derived from retrieving an existing dataService parameters represent

different domain concepts

Page 77: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

86

latitude

A Case for Semantics

• Service Identification:‣Assume the following service retrieves a satellite image

pertaining to (x,y) with resolution respective to r

• Questions to ask the system:‣ How to deduce that this service can be used?‣ How to determine what information is needed for input?‣ Did the user provide enough information to invoke this service?

get_image(double x, double y, double r)

inputsTo inputsToinputsTo

longitude grid_size

outputsTo

satellite image

Page 78: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

87

Indexing Services

• Services (inputs, outputs) are also registered in much the same way

Page 79: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

88

Systematic Service Planning

Ontology, O

Compose workflows in this form:

data derivation

service derivation

Page 80: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

89

Presentation Outline• Motivation & Introduction

• Our Service Composition System: Auspice‣ Metadata Framework‣ Cost-Aware Service Planning‣ Supporting Keyword Queries‣ Caching Intermediate Results‣ Elastic Cache Deployment

• Conclusion

Auspice

Page 81: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

90

Caching Intermediate Results

Page 82: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

91

A Hierarchical Cache

Page 83: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

92

MissesFastSlow

Hits(Slow)

Wouldn’t it be faster to centralize the index on the broker node?

Do we really need the broker index? Isn’t hashing faster?

Cache Access Types

Page 84: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

93

Experimental Workflows• Against Heterogeneous Bandwidths

Page 85: Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof

94

Centralized on Broker vs. HierarchicalOut-of-core!

In-core