1 6/29/2015 xldb ‘09 luke lonergan [email protected]

10
1 06/18/22 XLDB ‘09 Luke Lonergan [email protected]

Post on 21-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com

104/18/23

XLDB ‘09

Luke [email protected]

Page 2: 1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com

“Big” numbers for GP today

• 70K/day - Query Rate • 6.5PB – Dataset Size • +100GB/s – Analysis Rate • +3GB/s – Net Loading Rate • 100,000/s – Transaction Rate• 56 TB / kW, 1.6 GB/s/kW – Power Rate• 100s – Number of Data/Compute nodes

04/18/23 2

Page 3: 1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com

Things I’ve Heard

• Tiered computing– Organizational / Political / Geographic

boundaries require it

• Metadata computing for HEP– “10TB sounds small but it’s not easy”

• Processing for Radio Astronomy, HEP– Data intensive computing– Requires an efficient pipeline from raw to

consumables

04/18/23 3

Page 4: 1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com

Thoughts

• A lot of plumbing! Moving data around, pipeline processing– Core engine should do this so the plumbing

isn’t done over and over

• Need for specialized access methods and storage classes

• “Computing in data” is key to success

04/18/23 4

Page 5: 1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com

GP Basic Features

• Access Methods– Compression, Column Store, Heap Store,

External Tables, Indexes (GIST, GIN, Rtree, Bitmap, B-Tree, …)

– Network Ingest / Export directly into parallel pipeline

– Logical Partitioning by Range, List

• Parallel Programming Languages– SQL 2003 with Analytics– Map Reduce in Perl, Python, C, SQL, …– PL/R,python,perl,C,pgSQL,SQL, …

04/18/23 5

Page 6: 1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com

From Enterprise Data Clouds

• Elastic / adaptive infrastructure for data warehousing and analytics

– IT Operations deploy pools of low-cost commodity infrastructure

• Physical servers, virtual infrastructure, or onramp to public cloud

– DBAs and Analysts provision sandboxes and warehouses in minutes

• Assemble the data they need (common, private, etc) for agile analytics

04/18/23 6 Proprietary & Confidential

DBA

Analyst

ConsumerDivision

PackagedGoods

Finance

4040

881616 1616

120Free 1616 1616

68Free

9696 4040 64Free

Infrastructure

Warehouses

IT Operations

Page 7: 1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com

Use Case: Big TelcoData Mart Consolidation

04/18/23 7 Proprietary & Confidential

Goals:•Reduce maintenance and support costs from proliferation of data mart platforms

•Reduce risks and exposure due to data in shadow IT systems

•Break down silo walls - provide a unified way to find and access all data

Approach:•Embrace data – encourage ‘physical consolidation’ in advance of data model unification

•Provide ‘self serve’ model to bring shadow IT into the light

•Allow unified data access and pragmatic ‘logical’ data model unification incrementally

DataSources

US- West100 nodes

XX

X

X

XX

X

X

X

Page 8: 1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com

Use Case: Big Ad NetworkProject Sandboxes

04/18/23 8 Proprietary & Confidential

Goals:•Remove IT barriers to analyst productivity and value creation

•Dramatically reduce IT resource constraints and delays – i.e. realize ideas sooner

•Combine centralized ‘EDW’ data with freshly discovered feeds and other useful sources

Approach:•Self-serve creation of project warehouses in minutes – and elastically expand as needed

•Load new data feeds without requiring formal modeling

•Bring together any data within the EDC – even if globally distributed – and analyze

US- East100 nodes

Analyst’s New Warehouse

Analyst’s New Warehouse

Analyst’s Private

Data Feed

Analyst’s Private

Data Feed

EDC

Self-ServeDashboard

Page 9: 1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com

GP is Software – Develop Now

• Download at:– Gpn.greenplum.com– Get the VMWare image or use it on OSX, Linux,

Solaris

04/18/23 9

Page 10: 1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com

Think Big. Think Fast.