windows azure: lessons from the field
DESCRIPTION
Presentation at CodeStock 2010 on the lessons learned while building an application that handles post-processing of scientific data in the Cloud.TRANSCRIPT
Rob GillenAzure: Lessons From The Field
CodeStock is proudly partnered with:
Send instant feedback on this session via Twitter:
Send a direct message with the room number to @CodeStock d codestock 401 This speaker is great!
For more information on sending feedback using Twitter while at CodeStock, please see the “CodeStock README” in your CodeStock guide.
RecruitWise and Staff with Excellence - www.recruitwise.jobs
what we doConsulting | Debugging | Training
who we areFounded by top technical and business experts, we are a fast-growinggroup of outstanding consulting and training professionals who pull out all the stops to solve their clients’ problems.
how we do itConsulting & Debugging- Architecture, analysis, and design services- Full Lifecycle custom software development- Content creation- Project management- Debugging & performance tuning
Training- On-site instructor-led training- Virtual instructor-led training- Devscovery conferences
Agenda
Introductions Why is ORNL looking at Cloud Computing
Azure in 5 minutes
Post-Processing and Data Distribution in the Cloud Using Cloud Computing for Post-Processing
Composite applications
Lessons (being) Learned Using Cloud Computing for Post-Processing
Composite applications
Automated agents / jobs
· World’s most powerful open scientific computing facility
· Nation’s largest concentrationof open source materials research
Oak Ridge National Laboratory is DOE’s largest science and energy lab
· $1.6B budget· 4,350 employees· 3,900 research
guests annually· $350 million invested
in modernization
· Nation’s most diverse energy portfolio
· Operating the world’s most intense pulsed neutron source
· Managing the billion-dollar U.S. ITER project
Delivering science and technology
Bioenergy
Climate
Energy technologies
Ultrascale computing
National security
Materials at the nanoscale
Neutron sciences
Nuclear energy
ITER
Ultrascale Scientific Computing· Leadership Computing Facility: – World’s most powerful open scientific computing facility– Peak speed of 2.33 petaflops (> two thousand trillion calculations/sec)– 18,688 nodes, 224,526 compute cores, 299 TB RAM, 10,000 TB Disk– 4,352 ft2 floor space– Exascale system by the end of the next decade– Focus on computationally intensive projects
of large scale and high scientific impact
· Addressing key science and technology issues– Climate– Fusion– Materials– Bioenergy
–1st and 4th fastest super computers in the world.
The world’s most powerful system for open science
Then Why Look at Cloud Computing???
Science Takes Different Forms Tightly-coupled Simulations
Data-Parallelized
Embarrassingly Parallel
Dearth of Mid-Range Assets 256-1,000 cores
1 of many possible solutions
Scaling Issues Power Consumption
Programming Struggles
Fault-Tolerance
Forward-Looking Next-Generation Problems
Next-Generation Researchers
Private(On-Premise)
Infrastructure
(as a Service)
Platform(as a
Service)
Types of Clouds
Storage
Server HW
Networking
Servers
Databases
Virtualization
Runtimes
Applications
Security & Integration
Storage
Server HW
Networking
Servers
Databases
Virtualization
Runtimes
Applications
Security & Integration
Storage
Server HW
Networking
Servers
Databases
Virtualization
Runtimes
Applications
Security & Integration
You m
anage M
anaged b
y v
endor
Managed b
y v
endor
You m
anage
You m
anage
Private(On-Premise)
Types of Clouds
Infrastructure
(as a Service)
Platform(as a
Service)
Compute
Storage
Data SQL Azure Data Sync
Connectivity
Project “Sydney”
Security “Geneva”
Application Services
Frameworks
“Dublin” “Velocity”
Windows Azure Platform
Service Bus
Access Control
Table Storage
Blob Storage
Queue DriveContent Delivery Network
Windows Azure Compute
Compute
• Development, service hosting, & management environment– .NET, Java PHP, Python, Ruby, native code (C/C++, Win32, etc.)– ASP.NET providers, FastCGI, memcached, MySQL, Tomcat– Full-trust – supports standard languages and APIs– Secure certificate store– Management API’s, and logging and diagnostics systems
• Multiple roles – Web, Worker, Virtual Machine (VHD)• Multiple VM sizes
– 1.6 GHz CPU x64, 1.75GB RAM, 100Mbps network, 250GB volatile storage
– Small (1X), Medium (2X), Large (4X), X-Large (8X)
• In-place rolling upgrades, organized by upgrade domains– Walk each upgrade domain one at a time
Windows Azure Diagnostics
Compute
• Configurable trace, performance counter, Windows event log, IIS log & file buffering– Local data buffering quota management– Query & modify from the cloud and from the desktop per role
instance– Transfer to storage scheduled & on-demand– Filter by data type, verbosity & time range
Windows Azure Storage
• Rich data abstractions – tables, blobs, queues, drives, CDN– Capacity (100TB), throughput (100MB/sec), transactions (1K
req/sec)
• High accessibility– Supports geo-location– Language & platform agnostic REST APIs– URL: http://<account>.<store>.core.windows.net– Client libraries for .NET, Java, PHP, etc.
• High durability – data is replicated 3 times within a cluster, and (Feb 2010) across datacenters
• High scalability – data is automatically partitioned and load balanced across servers
StorageStorage
Windows Azure Table Storage
Table Storage
• Designed for structured data, not relational data• Data definition is part of the application
– A Table is a set of Entities (records)– An Entity is a set of Properties (fields)
• No fixed schema– Each property is stored as a <name, typed value> pair– Two entities within the same table can have different
properties– No schema is enforced
Windows Azure Blob Storage
Blob Storage
• Storage for large, named files plus their metadata• Block Blob
– Targeted at streaming workloads– Each blob consists of a sequence of blocks– Each block is identified by a Block ID– Size limit 200GB per blob
• Page Blob– Targeted at random read/write workloads– Each blob consists of an array of pages– Each page is identified by its offset from the start of the blob– Size limit 1TB per blob
Windows Azure Queue
Queue
• Performance efficient, highly available and provide reliable message delivery
• Asynchronous work dispatch– Inter-role communication – Polling based model; best-effort FIFO data structure
• Queue operations– Create Queue– Delete Queue– List Queues– Get/Set Queue Metadata
> Message operations• Add Message
• Get Message(s)
• Peek Message(s)
• Delete Message
Windows Azure Drive
Drive
• Provides a durable NTFS volume for Windows Azure applications to use– Use existing NTFS APIs to access a durable drive– Durability and survival of data on application failover – Enables migrating existing NTFS applications to the cloud– Drives can be up to 1TB; a VM can dynamically mount up to
8 drives
• A Windows Azure Drive is a Page Blob– Example, mount Page Blob as X:\– http://<account>.blob.core.windows.net/<container>/
<blob>– All writes to drive are made durable to the Page Blob– Drive made durable through standard Page Blob replication
Windows Azure Content Delivery Network
• Provides high-bandwidth global blob content delivery– 18 locations globally (US, Europe, Asia, Australia and South
America), and growing
• Blob service URL vs. CDN URL– Blob URL: http://<account>.blob.core.windows.net/– CDN URL: http://<guid>.vo.msecnd.net/ – Support for custom domain names
• Access details– Blobs are cached in CDN until the TTL passes– Use per-blob HTTP Cache-Control policy for TTL (new)– CDN provides only anonymous HTTP access
Content Delivery Network
Tenants of Internet-Scale Application Architecture
Design Horizontal scaling
Service-oriented composition
Eventual consistency
Fault tolerant (expect failures)
Security Claims-based authentication
& access control
Federated identity
Data encryption & key mgmt.
Management Policy-driven automation
Aware of application lifecycles
Handle dynamic data schema and configuration changes
Data & Content
De-normalization
Logical partitioning
Distributed in-memory cache
Diverse data storage options (persistent & transient, relational & unstructured, text & binary, read & write, etc.)
Processes Loosely coupled components
Parallel & distributed processing
Asynchronous distributed communication
Idempotent (handle duplicity)
Isolation (separation of concerns)
User
Private Cloud
Public Services
Application Patterns
Table StorageService
Blob StorageService
QueueService
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Web Svc(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Jobs(Worker
Role)
SilverlightApplication
Web Browser
MobileBrowser
WPFApplication
Service Bus
Access Control Service
WorkflowService
UserData
Application Data
Reference Data
Grid / Parallel Computing Application
Enterprise Data
Enterprise Web Svc
Enterprise Application
DataService
StorageService
IdentityService
ApplicationService
Enterprise Identity
Application Goals
Simulate Post-Processing of Scientific Data Generate Visualizations from “raw” data
Transform data to be consumable by general processes
Exercise various storage mechanisms
Focus on Mechanics The specific science problem being solved is secondary to the
approach
Goal is to refine approach such that it can fade allowing the science to regain preeminence
Putting Data Into the Cloud
Source Data NetCDF files – subset of US contribution to CMIP3 archive
Visualization Support Flatten Source Files to CSV
Generate base “heat map”
Combine heat map and base map
Generate Video/Animation
General Consumption/Publishing Expose data as a “service” (REST/XML/JSON, etc.)
Query-able
Azure Tables (OGDI) / Azure Blob
Context
35 TB of numbers – How Much Data Is That? A single lat/lon map at typical climate model resolution represents ~40 KB
If you wanted to look at all 35 TB in the form of these lat/lon plots and if… Every 10 seconds you displayed another map
You worked 24 hours/day, 365 days/year
You could complete the task in about 200 years.
Dataset Used 1 NetCDF file, approximately 92 MB, located in blob storage
1,825 CSV files generated. 815.84 MB total
Average file size is around 457.76 KB
Each CSV represented 12,690 data points (lat/lon/temp)
3,650 images generated 145.03 MB total
Heat Maps avg. 31.25 KB
Combined images avg. 49 KB
23,652,000 entities added to azure table
Some Lessons
Data Sizes Generic formats tend to be large (92 MB 816 MB)
Data transfer within Azure datacenter is fast (from your computer is slow)
Azure Tables Inserts/Deletes are slow but relatively linear
Partition keys are not queryable… store them
Not well suited for large, frequently changing data
Lessons Being Learned Watch your compilation model (esp. with native code)
Composite applications
Automated agents / jobs
ServicePointManager and ConnectionLimit
Flatten: CSV Upload Rate
Flatten: Queue Insert Duration
Flatten: Single Table Entity Insert
ImageGen: CSV File Download Duration
ImageGen: CSV File Download Rate
Finished Product
Parallelized Uploads for Faster Transfer
ODATA Service
ODATA Browser
Resources of Interest
• Blog: http://rob.gillenfamily.net (source for tools, extensions, etc.)
• Azure Scope: http://azurescope.cloudapp.net/ (perf tests, metrics, source, etc.)
• Excel-Driven Monte Carlo Simulation: http://code.msdn.microsoft.com/fullmonte
• ODATA Feed/Browser: http://data.sciencecloud.us
Thank you
The Microsoft CloudData Center Infrastructure
The Microsoft CloudData Center Infrastructure
The Microsoft Cloud~100 Globally Distributed Data Centers
Quincy, WA Chicago, IL San Antonio, TX Dublin, Ireland Generation 4 DCs