infinitely scalable clusters - grid computing on public cloud - new york

24
変変 [hen-tsoo] noun 1. Resourcefulness – the quality of being able to cope with a difficult situation 2. Adaptability – the ability to change (or be changed) to fit changed circumstan 3. Agility – the power of moving quickly and easily; nimbleness INFINITELY SCALABLE CLUSTERS Grid computing on public cloud

Upload: hentsu

Post on 08-Feb-2017

109 views

Category:

Technology


0 download

TRANSCRIPT

変通 [hen-tsoo] noun1. Resourcefulness – the quality of being able to cope with a difficult situation2. Adaptability – the ability to change (or be changed) to fit changed circumstances3. Agility – the power of moving quickly and easily; nimbleness

INFINITELY SCALABLE CLUSTERSGrid computing on public cloud

AGENDA• Grid Computing Background• On-Premise & Public Cloud• Google Cloud Platform• Demo

January 2017

BACKGROUND

TERMINOLOGY• Public Cloud (AWS, Azure,

Google)• Private Cloud (Your data

centre)• High Performance

Computing (HPC)• Grid Computing• Compute Cluster

• CPUs / Processors / Cores• RAM and Disk Storage• IaaS (virtual hardware

and networking)• PaaS (software services)

January 2017

WHAT IS PUBLIC CLOUD?“A service provider makes resources, such as virtual machines, applications and storage, available to the general public.”• Utility model• No contracts• Shared hardware / multi-tenant• Self managed

January 2017

WHAT IS GRID COMPUTING?Traditional Resource Limitations:• Data Store Performance • PC Processor / Memory / Storage• Network BandwidthThe researcher may wait a long time for results.

• Grid computing moves the computational work from the PC to a cluster of servers

• The cluster processes the data on behalf of the researcher and returns the results

• Processing time is reduced• Larger datasets can be tackled

January 2017

KEY CONCEPTSThe Challenges The Workflows

Number of Tasks

Size

of

Data

Big Data

High Throughput Computing

MapReduce

High Performance Computing

Ingest Process

Analyse

Visualise

Store

January 2017

CHOICE OF TOOLS AND PLATFORMS

January 2017

ON-PREMISE & PUBLIC CLOUD

HARDWARE INFLEXIBILITY• Buy 22 core processors at

2.2GHz or 6 core processors at 3.6GHz?

• Buy 8GB, 16GB or 32GB memory modules (RAM per core ratio)?

• Graphical Processing Units (GPUs)?

• How much local storage per server?

• What network devices between servers (32 or 48 port switches)?

• What size file server?

Monday Tuesday Wednesday Thursday Friday Saturday Sunday0

20

40

60

80

100

120

Date

Jobs

per

day

Grid usage varies depending on research priorities:

January 2017

EXAMPLE OF MATLAB GRID WITH PUBLIC CLOUD- Pay only for what you use- Scale compute resource

up AND down- Minimal capital outlay on

hardware- Experiment with grid

computing platforms quickly, cheaply and with no commitment

January 2017

A DAY IN A PUBLIC CLOUD CLUSTER

Time 02:00:0004:10:0006:20:0008:30:0010:40:0012:50:0015:00:0017:10:0019:20:0021:30:0023:40:000

20

40

60

80

100

120

140

160

180

Workers Tasks in Queue

- Cluster consisting 32x 4 cores

- Max 128 worker nodes- Ramps up as jobs get

submitted- Tears down nodes

when jobs finished- Minimising costs when

not in use

January 2017

IDEAL CLUSTER SIZE?

8 16 32 64 96 128 160 192 2240

200

400

600

800

1000

1200

1400

Job Run time in seconds

Cores

Seco

nds

Ingest Process

Analyse

Visualise

Store

Optimise other parts of the workflow?

January 2017

RUNNING HYBRID CLUSTER ON IAASAWS vCPUs are hyper-threaded™

Each vCPU is a hyper thread of an Intel Xeon core for 2nd generation instance types(M4, M3, C4, C3, R3, HS1, G2, I2, and D2)https://aws.amazon.com/ec2/instance-types/

Azure does not overcommit memory or cores. vCPUs are physical cores.Azure does not use hyper-threading.https://aws.amazon.com/ec2/instance-types/ 

January 2017

CLOUD GRID DEPLOYMENT OPTIONS1. Infrastructure as a Service (IaaS) DIY

Spin up a compute cluster on VMs for additional capacity and new workloads

2. BurstUse existing on premises compute cluster and burst on cloud as required

3. Software as a Service (SaaS)Software vendors and Managed Service Providers provide their own SaaS solutions. Pay for compute and application software per hour

4. Platform as a Service (PaaS)Cloud providers’ data analytics platform as a service:Google BigQuery & Datalab, Microsoft HDInsight, Amazon EMR

January 2017

CLOUD HOSTED DATA ANDANALYTICS AS A SERVICE

GOOGLE BIG DATA REFERENCE ARCHITECTURE

January 2017

BIGQUERY – A GOOGLE CLOUD PLATFORM SERVICE• Fully managed and serverless architecture• Massively scalable to petabytes of data, without the need to

capacity plan• Resources are deployed as necessary in the background to run

queries in seconds• Standard SQL queries • Table partitioning• No indexing needed• Simple pricing model:

• Data storage, streaming inserts, and queries are charged• Data loading and exporting are free of charge

January 2017

BIGQUERY TECHNICAL BACKGROUNDHadoop based “service that enables interactive analysis of massively large datasets”• Distributed File System -

Stores data that’s larger than can fit on a single machine

• Map Reduce – Distributes processing across multiple systems

http://blogs.forrester.com/mike_gualtieri/13-06-07-what_is_hadoop

January 2017

GOOGLE BIGQUERY AND DATALAB DEMO

FINAL NOTES – DON’T FORGET SECURITYSecurity considerations:• Secure transfer and storage of data and code• Secure remote access to cloud hosted environment• Secure authentication

• Windows AD Credentials• AWS IAM Credentials• Google Accounts• Microsoft Accounts

• Auditing (who accessed what, who changed what)

January 2017

SUMMARY• Traditional grid and HPC tools can benefit from moving into

cloud• Vast landscape of available tools• Off-the-shelf PaaS offerings• Integrations and ecosystems• Cheap and very quick to experiment

January 2017

[email protected]://hentsu.com

London:1 Fore StreetLondon EC2Y

9DTNew York:

450 Lexington Ave

New York 10017

MORE INFORMATION?

NEXT NEW YORK EVENT:MAY 2017Cognitive Cloud ComputingMachine learning and AI for trading strategies

January 2017