fine-grained scalability of digital library services in the cloud

22
Fine-grained Scalability of Digital Library Services in the Cloud Lebeko Poulo, Lighton Phiri and Hussein Suleman Digital Libraries Laboratory Department of Computer Science University of Cape Town

Upload: lighton-phiri

Post on 19-Jun-2015

212 views

Category:

Education


0 download

DESCRIPTION

I was giving a talk---on behalf of a colleague---at the 2014 SAICSIT conference, held in Pretoria, South Africa. It went well, I think... The pre-print is here [1] and actual publisher copy is here [2]. I used a custom-tailored Beamer template this time around. I prefer simplicity these days. [1] http://goo.gl/qOJTye [2] http://goo.gl/JvVQdZ

TRANSCRIPT

Page 1: Fine-grained Scalability of Digital Library Services in the Cloud

Fine-grained Scalability ofDigital Library Services in the

Cloud

Lebeko Poulo, Lighton Phiriand Hussein Suleman

Digital Libraries LaboratoryDepartment of Computer Science

University of Cape Town

Page 2: Fine-grained Scalability of Digital Library Services in the Cloud

Research Overview

� Digital Libraries (DLs) and Digital LibrarySystems (DLSes)

� Research objectives� Develop techniques for building scalable digital

information management systems based on efficientand on-demand use of generic grid-basedtechnologies

� Explore the use of existing cloud computingresources

� Research questions� Can a typical DL architecture be layered over an

on-demand paradigm such as cloud computing?� Is there linear scalability with increasing data and

service capacity needs?

Page 3: Fine-grained Scalability of Digital Library Services in the Cloud

How Quickly Does Data Scale?

� Extent of data scalability� Data growth rates estimated at 40% per year� By 2020, data volumes will have grown to 44 times

the 2009 size

Page 4: Fine-grained Scalability of Digital Library Services in the Cloud

Scaling Digital Library Systems

� Key criteria for design/implementation of DLSes� Scalability� Preservation

� The promise of cloud computing proven manytimes

� Feasibility of migrating and hosting DLs evident

� Investigation of deep integration of DL serviceswith cloud services required

� Investigate efficacy of DL cloud adoption� Verify extent of unlimited scale� Maximise potential for cloud-service-level scalability

Page 5: Fine-grained Scalability of Digital Library Services in the Cloud

Prototype DLS - Design

� RQ #1—Can a typical DL architecture belayered over an on-demand paradigm?

� Prior work on potential architectural designs forutility clouds

� Emulation of parallel programming architectures� Utility computing offers flexibility of multiple

architectural models� Potential architectures for scalable utility services

� Two architectural patterns adopted as basis fordesign of prototype architecture

� Proxy architectures� Some aspects of Client-side architecture

Page 6: Fine-grained Scalability of Digital Library Services in the Cloud

Prototype DLS - Architecture

Browse Module

AmazonS3

Buckets

AmazonEBS

AmazonSimpleDB

Search Module

Domains

OAI-PMH Harvester Module

InstanceA

InstanceB

InstanceC

InstanceD

Amazon EC2

REST API

Page 7: Fine-grained Scalability of Digital Library Services in the Cloud

Prototype DLS - Services

Browse Module Search Module OAI-PMH Harvester Module

Web User Interface

� Two typical DL services, accessible via publiclyavailable Light-weight process Web interface

� Browse module—enable access through gradualrefinement

� Search module—enable access through searchqueries

� OAI-PMH endpoint used to ingest data intocollections

Page 8: Fine-grained Scalability of Digital Library Services in the Cloud

Prototype DLS - Application Server

InstanceA

InstanceB

InstanceC

InstanceD

Amazon EC2

REST API

� Amazon Elastic Compute Cloud (EC2) toprovide sizeable computing capacity

� 32-bit Ubuntu Amazon Machine Images (AMIs)� Glassfish 3.1� Prototype DLS

Page 9: Fine-grained Scalability of Digital Library Services in the Cloud

Prototype DLS - Data Storage

AmazonS3

Buckets

AmazonEBS

AmazonSimpleDB

Domains

REST API

� Amazon Simple Storage Service (S3) for storageand retrieval of large numbers of data objects

� Amazon SimpleDB for querying storedstructured data

� Amazon Elastic Block Store (EBS) to enablestorage persistence of EC2 instances

Page 10: Fine-grained Scalability of Digital Library Services in the Cloud

Evaluation - Experimental Design� RQ #2—Is there linear scalability with increasing

capacity needs?� Goals

� Evaluate potential scalability advantages associatedwith cloud-based DLs

� Evaluation aspects� Data/service scalability and load testing

� Workload� Number of user requests, number of users and

collection sizes

� Metrics� Response time

� Factors� EC2 instances, users, requests, collection size

Page 11: Fine-grained Scalability of Digital Library Services in the Cloud

Evaluation - Experimental Setup

� Test dataset—NDLTD and NETD portals� Ingested using OAI-PMH harvester module

� Execution environment� All experimental test conducted on EC2 cloud

infrastructure� EC2 instance of type t1.micro used for

server-side processing� 32-bit Ubuntu Amazon Machine Image (AMI)

configuration

� Apache JMeter used to simulate user requests� All measurement results based on five-run

averages

Page 12: Fine-grained Scalability of Digital Library Services in the Cloud

Experiment #1 - Service Scalability

� Determine the time taken for browse and searchservice requests

� Assess impact due to variation of multiple serverfront-ends

� Methodology� JMeter used to simulate 50 users for each Web

service, ten times� Web services hosted on four identical EC2 instances� Experiments repeated at least five times for each

service criteria� Comparative analysis—browsing categories for

browse service—by partitioning requests into blocksof 50

Page 13: Fine-grained Scalability of Digital Library Services in the Cloud

Experiment #1 - Browse Service

600

800

1000

1200

1-5

0

51-1

00

101

-150

151

-200

201

-250

251

-300

301

-350

351

-400

401

-450

451

-500

Number of requests

Res

pons

etim

e(m

s)

Browsing by title Browsing by date Browsing by author

Page 14: Fine-grained Scalability of Digital Library Services in the Cloud

Experiment #1 - Browse Service (2)

600

900

1200

1500

1356

9251

1112

313

5692

5111

538

1356

9251

1193

413

5692

5112

362

1356

9251

1276

513

5692

5113

081

1356

9251

1349

113

5692

5113

821

1356

9251

1411

813

5692

5114

515

1356

9251

1475

713

5692

5115

156

1356

9251

1545

513

5692

5115

761

1356

9251

1607

513

5692

5116

568

1356

9251

1678

913

5692

5117

103

1356

9251

1752

813

5692

5117

749

1356

9251

1817

813

5692

5118

446

1356

9251

1874

213

5692

5119

092

1356

9251

1942

5

Timestamp

Res

pons

etim

e(m

s)

Browse by author Browse by date Browse by title

Page 15: Fine-grained Scalability of Digital Library Services in the Cloud

Experiment #1 - Browse Service (3)

500

600

700

1-5

0

51-1

00

101

-150

151

-200

201

-250

251

-300

301

-350

351

-400

401

-450

451

-500

Number of requests

Tim

e/bl

ock

(ms)

1 instance 2 instances 3 instances 4 instances

Page 16: Fine-grained Scalability of Digital Library Services in the Cloud

Experiment #2 - Data Scalability

� Determine service performance for varyingcollection sizes for fixed number of servers

� Ascertain if application can cope with increasingdata volumes in DL collections

� Methodology� JMeter set up to simulate 50 users accessing a Web

service ten times� Fixed number of identical servers with collection

sizes of 4k, 8k, 16k and 32k records� Experiments repeated at least five times for each

service� Comparative analysis by partitioning requests into

blocks of 50

Page 17: Fine-grained Scalability of Digital Library Services in the Cloud

Experiment #2 - Browse Service

800

900

1000

1100

1-5

0

51-1

00

101

-150

151

-200

201

-250

251

-300

301

-350

351

-400

401

-450

451

-500

Number of requests

Res

pons

etim

e(m

s)

4000 8000 16000 32000

Page 18: Fine-grained Scalability of Digital Library Services in the Cloud

Experiment #3 - Load Testing

� Determine volume of requests application couldprocess for increasing concurrent users

� Methodology� JMeter set up to varying number of users accessing

a Web service� Fixed number of identical servers used� Initially simulate five users, each accessing a Web

service ten times� Subsequent simulation of 20, 50, 100, 250 and 500

users� Experiments repeated at least five times for each

service

Page 19: Fine-grained Scalability of Digital Library Services in the Cloud

Experiment #3 - All Services

700

800

900

1000

5 20 50 100

250

500

Number of users

Res

pons

etim

e(m

s)

Search operation Browse operation

Page 20: Fine-grained Scalability of Digital Library Services in the Cloud

Conclusion

� Key findings� Redesign of application architectural components to

conform to cloud service architecture� Results indicate that response times are not

significantly affected by request complexity,collection size or request sequencing

� Noticeable time taken to connect to AWS—ramp uptime

� Study Limitations� Single EC2 instance type—t1.micro—used� Cloud service vendor� Experimental dataset size� Query optimisation� Synthetic load used

Page 21: Fine-grained Scalability of Digital Library Services in the Cloud

Bibliography

Hussein Suleman (2009).Utility-based High Performance Digital Library Systems.

Pradeep Teregowda et al. (2010).Cloud Computing: A Digital Libraries Perspective.

Pradeep Teregowda et al. (2010).CiteSeerx: A Cloud Perspective.

Byung Chul Tak et al. (2011).To Move or Not to Move: The Economics of CloudComputing.

Jinesh Varia (2011).

Architecting for The Cloud: Best Practices.

Page 22: Fine-grained Scalability of Digital Library Services in the Cloud

Questions?

Additional information

http://dl.cs.uct.ac.za