scaling blackboard learn™ for high performance and delivery

Post on 10-May-2015

1.592 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scaling Blackboard Learn™ for High Performance and

AvailabilityStephen Feldman

Sr. Director Performance, Security and Architecture

Quick Bio

• Blackboard since 2003• Performance Engineering

from the start• Platform Architecture in

2005• Security Engineering in

2010

“Love my job…love my team. If you email me, I will respond.”

@seven_seconds

http://goo.gl/Z4Rq5

A Quick History Lesson of Bb…

• First release was 6.0.11 launched within a few weeks of arriving.

• Technology shift from Perl to Java through Release 5 and Release 8.– Blackboard was the largest PerlEx ISV in the world in 2003.

• Customers were having issues with optimizing Java, Oracle and SQL Server

• First benchmark was at Sun in 2004 called the Tunathon.– Learned that Blackboard Learn could scale and could scale

to high-levels with a little TLC.

As We Started Growing and Scaling

• In late 2004, we started building the Ref Arch as a model for customers

• Proved it out in benchmarks, as well as our own hosting facilities.

• We needed other players to come in and work with us to help us learn and validate a solution

• Key to our success: aggressive port from Perl to Java, earliest adoption of technologies: Solaris10, Oracle 10g, RHL 4 and 5, SQL 2005 and Java 5/6– Willingness to adopt virtualization very early on– Willingness to open our technology stack for

affordable solutions such as NFS and CIFS

Where We Are Today

• We have multiple customers supporting nearly 1 million users and dozens well over 250k live production users.

• Our benchmarks have been successful supporting over 1 million users with greater than 100k simultaneous sessions with sub-3s response times.

• The majority of our customers have benefitted from the Reference Architecture and have completely transformed their deployment to support the adoption and growth of the product.

In The Beginning: RefArch I

Focus of RefArch I

• Distribution of application and database– Need for load-balancing the application server– Early JVM clustering

• Fiber Storage and High-Speed Disks– Low-cost option to use JBODs

• Basic operational monitoring– Hardware, Network and Storage– Database

• Keep it simple and you will succeed

A Few Years Later Came RefArch II

Federated ApplicationsEnterprise SearchOther WSI ...

Application Layer

Enterprise Storage

· Optimization· Backup· Recovery· Growth

Analysis

SNMP

ManagementMonitoring

Integration

Even

t-Driven

M

gm

t.A

dvan

ced

Rep

ortin

gB

ehavio

r M

od

el Stu

dies

SIS & Back office

B2 Partners

Campus Systems

Email

Publishers

Directory Svcs.

SSO

Portals

SMS & MobileCam

paig

n

Mg

mt.

Database Layer

Blackboard Reference Architecture

User Experience

Virtualization

MonitoringManagementClustering

Load Balancing

Beyond ServicesBlackboardInstitution

AccessSecurityIdentify

…Then Marketing Got their Hands on It

Focus of RefArch II

Infrastructure

• Virtualization• Blade Computing• NFS/CIFS/ISCSI

Storage• Mobile Access• Identity

Management

Monitoring Services

• User Experience Monitoring

• Enterprise Infrastructure Monitoring

• Database Trending• JVM (JMX

Monitoring)• Synthetic

Monitoring

Optimization

• 64-bit Computing• Compression/

Caching• Image

Optimization• JVM Optimization• Database Wait

Event Tuning

What are we modeling today and future…

Large Connected Communities• 100’s to 1000’s of

Concurrent Requests

Heavy Adoption of Advanced Tools• Emphasis on Mobility and

Synchronous Computing

Extended/Frequent Time in System• Ubiquitous Access

Richer Content and User Experience• Instantaneous and

Immediate Expectations

Reference Architecture III

Unified Approach Working Together

RefArch1 RefArch2 RefArch3

No Longer Center, but Parallel…

Introducing RefArch III

Identity & Access Management

Logging and Monitoring Cloud Services Secure Performance Immunity

Analytics

Web OptimizationMobility Virtualization & Provisioning

Data Management

Now Comes RefArch III

AccessibilityUbiquitous Access and

Mobility

Cloud Service Management

SAAS Application Integration

Cloud-Based Benchmark/Testing

Web Optimization/Accelerati

on

ConfidenceAdvanced System

Provisioning

Enterprise Monitoring Lifecycle Management

Enterprise Logging

Institutional Analytics

Secure Management/Infrastruct

ure

Defining SLAs

Performance

Scalability

Availability

The amount of useful work accomplished by a computer system compared to the time and resource used.

The ability for a distributed system to expand by accommodating greater levels of load while maintaining similar levels of performance.

The capability to service a functional request without issue under conditions of desired performance and workload scalability

Defining SLAs

Define Metrics: Goal

Setting

Identify Method of Gathering:

Isolate Tools and Processes

Implement Instrumentation: Begin Measuring

Align to KPI/ROI: Share

with Stakeholders

Recommend Changes:

Show Business Value

Reset Expectations:

New Initiatives

What is Performance?

• Performance is quantifiable and measureable• Performance is also perception• Mostly recognized from a cognitive perspective

– Instantaneous– Immediate– Continuous– Captive

Response Time Latency Performance

What is Scalability?

What is Availability?

• High-availability offerings mask the effects of a system failure in order to minimize the impact of access and functional use of a system to a community of users.

• Simple Definition:– Percentage of time the system is in its operational state.

• You will often hear the concept of 3x9’s, 4x9’s or even 5x9’s– Planned versus Unplanned

• Availability = (Total Units of Time – Downtime) / Total Units of Time– 8760 hours in a year– Downtime = 10 hours– Availability = (8760 – 10)/8760 = 99.88%

Quick View into Availability StatisticsAvailability Percentage Model Unexpected Downtime per Year

90% 36.5 days

95% 18.25 days

98% 7.30 days

99% 3.65 days

99.5% 1.83 days

99.8% 17.52 hours

99.9% 8.76 hours

99.95% 4.38 hours

99.99% 52.6 minutes

99.999% 5.26 minutes

99.9999% 31.5s

Automated Provisioning

• Simple routine of provisioning systems• Master processes and reduce human error• Balance workloads• Quick recovery• Emphasis on efficient computing

Complete Monitoring and Logging Solutions

Performance• User

Experience Monitoring

• Application Lifecycle Management

• Database Wait Event Monitoring

Scalability• Infrastructure

Resource Monitoring

• JMX Monitoring

• Database Trending

• Log Management

Availability• Infrastructure

Trending• Remote

Synthetic Monitoring

Application Lifecycle Management

• True application insight and visibility• Business processing mapping to transaction

SLAs• Multi-layer correlation• Transaction workflow mapping

Web Optimization Services

• Typical Optimization Services– Compression– Domain Sharding– Minification– Consolidation– Inlining– Asynchronous JavaScript– Response Prediction– Browser Caching

Present and Future of Caches

• Caches are used throughout Blackboard Learn to manage the life and reuse of data.– Leveraging ehCache presently in Release 9.1

• Caches can and should be controlled via the cache-settings.properties file– Insight into the caches can be achieved in the

Admin Console and other JMX tools.

• Next generation of caches: pluggable caches (use your own) and distributed caches

Steve Feldman@seven_seconds

Please provide feedback for this session by emailingBbWorldFeedback@blackboard.com.

Scaling Blackboard Learn™for High Performance and Delivery

top related