windows azure: lessons from the field

Rob GillenAzure: Lessons From The Field

CodeStock is proudly partnered with:

Send instant feedback on this session via Twitter:

Send a direct message with the room number to @CodeStock d codestock 401 This speaker is great!

For more information on sending feedback using Twitter while at CodeStock, please see the “CodeStock README” in your CodeStock guide.

RecruitWise and Staff with Excellence - www.recruitwise.jobs

what we doConsulting | Debugging | Training

who we areFounded by top technical and business experts, we are a fast-growinggroup of outstanding consulting and training professionals who pull out all the stops to solve their clients’ problems.

how we do itConsulting & Debugging- Architecture, analysis, and design services- Full Lifecycle custom software development- Content creation- Project management- Debugging & performance tuning

Training- On-site instructor-led training- Virtual instructor-led training- Devscovery conferences

Lessons from the Field:

Rob [email protected]

rob.gillenfamily.net@argodev

Azure for Science

Agenda

Introductions Why is ORNL looking at Cloud Computing

Azure in 5 minutes

Post-Processing and Data Distribution in the Cloud Using Cloud Computing for Post-Processing

Composite applications

Lessons (being) Learned Using Cloud Computing for Post-Processing


Automated agents / jobs

· World’s most powerful open scientific computing facility

· Nation’s largest concentrationof open source materials research

Oak Ridge National Laboratory is DOE’s largest science and energy lab

· $1.6B budget· 4,350 employees· 3,900 research

guests annually· $350 million invested

in modernization

· Nation’s most diverse energy portfolio

· Operating the world’s most intense pulsed neutron source

· Managing the billion-dollar U.S. ITER project

Delivering science and technology

Bioenergy

Climate

Energy technologies

Ultrascale computing

National security

Materials at the nanoscale

Neutron sciences

Nuclear energy

ITER

Ultrascale Scientific Computing· Leadership Computing Facility: – World’s most powerful open scientific computing facility– Peak speed of 2.33 petaflops (> two thousand trillion calculations/sec)– 18,688 nodes, 224,526 compute cores, 299 TB RAM, 10,000 TB Disk– 4,352 ft2 floor space– Exascale system by the end of the next decade– Focus on computationally intensive projects

of large scale and high scientific impact

· Addressing key science and technology issues– Climate– Fusion– Materials– Bioenergy

–1st and 4th fastest super computers in the world.

The world’s most powerful system for open science

Then Why Look at Cloud Computing???

Science Takes Different Forms Tightly-coupled Simulations

Data-Parallelized

Embarrassingly Parallel

Dearth of Mid-Range Assets 256-1,000 cores

1 of many possible solutions

Scaling Issues Power Consumption

Programming Struggles

Fault-Tolerance

Forward-Looking Next-Generation Problems

Next-Generation Researchers

Private(On-Premise)

Infrastructure

(as a Service)

Platform(as a

Service)

Types of Clouds

Storage

Server HW

Networking

Servers

Databases

Virtualization

Runtimes

Applications

Security & Integration

Storage

Server HW

Networking

Servers

Databases

Virtualization

Runtimes

Applications


Storage

Server HW

Networking

Servers

Databases

Virtualization

Runtimes

Applications


You m

anage M

anaged b

y v

endor

Managed b

y v

endor

You m

anage

You m

anage

Private(On-Premise)

Types of Clouds

Infrastructure

(as a Service)

Platform(as a

Service)

Compute

Storage

Data SQL Azure Data Sync

Connectivity

Project “Sydney”

Security “Geneva”

Application Services

Frameworks

“Dublin” “Velocity”

Windows Azure Platform

Service Bus

Access Control

Table Storage

Blob Storage

Queue DriveContent Delivery Network

Windows Azure Compute

Compute

• Development, service hosting, & management environment– .NET, Java PHP, Python, Ruby, native code (C/C++, Win32, etc.)– ASP.NET providers, FastCGI, memcached, MySQL, Tomcat– Full-trust – supports standard languages and APIs– Secure certificate store– Management API’s, and logging and diagnostics systems

• Multiple roles – Web, Worker, Virtual Machine (VHD)• Multiple VM sizes

– 1.6 GHz CPU x64, 1.75GB RAM, 100Mbps network, 250GB volatile storage

– Small (1X), Medium (2X), Large (4X), X-Large (8X)

• In-place rolling upgrades, organized by upgrade domains– Walk each upgrade domain one at a time

Windows Azure Diagnostics

Compute

• Configurable trace, performance counter, Windows event log, IIS log & file buffering– Local data buffering quota management– Query & modify from the cloud and from the desktop per role

instance– Transfer to storage scheduled & on-demand– Filter by data type, verbosity & time range

Windows Azure Storage

• Rich data abstractions – tables, blobs, queues, drives, CDN– Capacity (100TB), throughput (100MB/sec), transactions (1K

req/sec)

• High accessibility– Supports geo-location– Language & platform agnostic REST APIs– URL: http://<account>.<store>.core.windows.net– Client libraries for .NET, Java, PHP, etc.

• High durability – data is replicated 3 times within a cluster, and (Feb 2010) across datacenters

• High scalability – data is automatically partitioned and load balanced across servers

StorageStorage

Windows Azure Table Storage

Table Storage

• Designed for structured data, not relational data• Data definition is part of the application

– A Table is a set of Entities (records)– An Entity is a set of Properties (fields)

• No fixed schema– Each property is stored as a <name, typed value> pair– Two entities within the same table can have different

properties– No schema is enforced

Windows Azure Blob Storage

Blob Storage

• Storage for large, named files plus their metadata• Block Blob

– Targeted at streaming workloads– Each blob consists of a sequence of blocks– Each block is identified by a Block ID– Size limit 200GB per blob

• Page Blob– Targeted at random read/write workloads– Each blob consists of an array of pages– Each page is identified by its offset from the start of the blob– Size limit 1TB per blob

Windows Azure Queue

Queue

• Performance efficient, highly available and provide reliable message delivery

• Asynchronous work dispatch– Inter-role communication – Polling based model; best-effort FIFO data structure

• Queue operations– Create Queue– Delete Queue– List Queues– Get/Set Queue Metadata

> Message operations• Add Message

• Get Message(s)

• Peek Message(s)

• Delete Message

Windows Azure Drive

Drive

• Provides a durable NTFS volume for Windows Azure applications to use– Use existing NTFS APIs to access a durable drive– Durability and survival of data on application failover – Enables migrating existing NTFS applications to the cloud– Drives can be up to 1TB; a VM can dynamically mount up to

8 drives

• A Windows Azure Drive is a Page Blob– Example, mount Page Blob as X:\– http://<account>.blob.core.windows.net/<container>/

<blob>– All writes to drive are made durable to the Page Blob– Drive made durable through standard Page Blob replication

Windows Azure Content Delivery Network

• Provides high-bandwidth global blob content delivery– 18 locations globally (US, Europe, Asia, Australia and South

America), and growing

• Blob service URL vs. CDN URL– Blob URL: http://<account>.blob.core.windows.net/– CDN URL: http://<guid>.vo.msecnd.net/ – Support for custom domain names

• Access details– Blobs are cached in CDN until the TTL passes– Use per-blob HTTP Cache-Control policy for TTL (new)– CDN provides only anonymous HTTP access

Content Delivery Network

Tenants of Internet-Scale Application Architecture

Design Horizontal scaling

Service-oriented composition

Eventual consistency

Fault tolerant (expect failures)

Security Claims-based authentication

& access control

Federated identity

Data encryption & key mgmt.

Management Policy-driven automation

Aware of application lifecycles

Handle dynamic data schema and configuration changes

Data & Content

De-normalization

Logical partitioning

Distributed in-memory cache

Diverse data storage options (persistent & transient, relational & unstructured, text & binary, read & write, etc.)

Processes Loosely coupled components

Parallel & distributed processing

Asynchronous distributed communication

Idempotent (handle duplicity)

Isolation (separation of concerns)

User

Private Cloud

Public Services

Application Patterns

Table StorageService

Blob StorageService

QueueService

ASP.NET(Web Role)

ASP.NET(Web Role)

ASP.NET(Web Role)

ASP.NET(Web Role)

ASP.NET(Web Role)

ASP.NET(Web Role)

ASP.NET(Web Role)

ASP.NET(Web Role)

ASP.NET(Web Role)

Web Svc(Web Role)

ASP.NET(Web Role)

ASP.NET(Web Role)

ASP.NET(Web Role)

ASP.NET(Web Role)

Jobs(Worker

Role)

SilverlightApplication

Web Browser

MobileBrowser

WPFApplication

Service Bus

Access Control Service

WorkflowService

UserData

Application Data

Reference Data

Grid / Parallel Computing Application

Enterprise Data

Enterprise Web Svc

Enterprise Application

DataService

StorageService

IdentityService

ApplicationService

Enterprise Identity

Application Goals

Simulate Post-Processing of Scientific Data Generate Visualizations from “raw” data

Transform data to be consumable by general processes

Exercise various storage mechanisms

Focus on Mechanics The specific science problem being solved is secondary to the

approach

Goal is to refine approach such that it can fade allowing the science to regain preeminence

Putting Data Into the Cloud

Source Data NetCDF files – subset of US contribution to CMIP3 archive

Visualization Support Flatten Source Files to CSV

Generate base “heat map”

Combine heat map and base map

Generate Video/Animation

General Consumption/Publishing Expose data as a “service” (REST/XML/JSON, etc.)

Query-able

Azure Tables (OGDI) / Azure Blob

Context

35 TB of numbers – How Much Data Is That? A single lat/lon map at typical climate model resolution represents ~40 KB

If you wanted to look at all 35 TB in the form of these lat/lon plots and if… Every 10 seconds you displayed another map

You worked 24 hours/day, 365 days/year

You could complete the task in about 200 years.

Dataset Used 1 NetCDF file, approximately 92 MB, located in blob storage

1,825 CSV files generated. 815.84 MB total

Average file size is around 457.76 KB

Each CSV represented 12,690 data points (lat/lon/temp)

3,650 images generated 145.03 MB total

Heat Maps avg. 31.25 KB

Combined images avg. 49 KB

23,652,000 entities added to azure table

Some Lessons

Data Sizes Generic formats tend to be large (92 MB 816 MB)

Data transfer within Azure datacenter is fast (from your computer is slow)

Azure Tables Inserts/Deletes are slow but relatively linear

Partition keys are not queryable… store them

Not well suited for large, frequently changing data

Lessons Being Learned Watch your compilation model (esp. with native code)


Automated agents / jobs

ServicePointManager and ConnectionLimit

Flatten: CSV Upload Rate

Flatten: Queue Insert Duration

Flatten: Single Table Entity Insert

ImageGen: CSV File Download Duration

ImageGen: CSV File Download Rate

Finished Product

Parallelized Uploads for Faster Transfer

ODATA Service

ODATA Browser

Resources of Interest

• Blog: http://rob.gillenfamily.net (source for tools, extensions, etc.)

• Azure Scope: http://azurescope.cloudapp.net/ (perf tests, metrics, source, etc.)

• Excel-Driven Monte Carlo Simulation: http://code.msdn.microsoft.com/fullmonte

• ODATA Feed/Browser: http://data.sciencecloud.us

http://rob.gillenfamily.net/

http://azurescope.cloudapp.net/

http://azurescope.cloudapp.net/

http://code.msdn.microsoft.com/fullmonte

http://code.msdn.microsoft.com/fullmonte

http://data.sciencecloud.us/

[email protected]

Thank you

The Microsoft CloudData Center Infrastructure

The Microsoft Cloud~100 Globally Distributed Data Centers

Quincy, WA Chicago, IL San Antonio, TX Dublin, Ireland Generation 4 DCs

windows azure: lessons from the field

Technology

largest science

department of energy

capabilitiesnuclear

energy lab

key science

science agendaintroductionswhy

field codestock

codestock guide