in-memory and timeseries technology to accelerate nosql analytics

27
Exploiting Informix In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics on Intel® Xeon® Servers Sandor Szabo – IBM Marn Fuerderer – IBM Jantz Tran – Intel® 1

Upload: sandor-szabo

Post on 20-Mar-2017

118 views

Category:

Software


2 download

TRANSCRIPT

Page 1: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Exploiting Informix In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics on Intel® Xeon® Servers

Sandor Szabo – IBMMartin Fuerderer – IBM

Jantz Tran – Intel®

1

Page 2: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Agenda

• Goal: Development of a noSQL Benchmark • Informix Warehouse Accelerator• noSQL Workload Development

• Dataset • Queries

• Test Results on Intel® Xeon® server platform• Intel® Xeon® E7 server platform overview

Page 3: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Workload: Benchmark goals...

l Goal to show Intel® and IBM technologyl Already reported results on POPS, TPC-DS,...l NoSQL benchmark, but NO good standard foundl Since water is an issue in California let use data from the ecosystems and environment l We defined our own benchmarkl Includes : In-Memory , noSQL, Timeseries

3

Page 4: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Workload: Why Informix Warehouse Accelerator (IWA)l In Memory Databasel Scaling with number of processors l Massive parallelism l Multi-Core and Vector Optimised Algorithml Need reproducible environment l No disk I/O only CPU operationl Can scale with new Intel Processor designl Behaves well with hyper-threading

4

Page 5: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

5

IBM Informix Warehouse Accelerator

Page 6: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Intelligent Frequence Paritioning

64 bit Intel/AMDPrcessors

TB of RAM Memoria

Predicates Evaluationon Compressed Data

CommonValues

RareValues

Nu

mb

er

of

oc

cu

rren

ce

s

SIMD

No Need for Aggregate Tables

Row and Column Store

Compresion

IWA Technology Innovations provide:Extreme speed for fast business decisions

Page 7: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

7

IBM Informix Warehouse Accelerator (IWA)

Results

Analyticquery

Linux on Intel / AMD 64-bit

TCP/IP

Query Optimizer

In-Memory Compressed

ColumnarDatabase Partition

Bulk Loader

Query Processor

Yes

Analyticquery

Results

AccelerateQuery?

Most Unix/Linux 64-bit platforms

In-Disk[Compressed]

Relational / Row-basedDatabase

Informix database serverInformix Warehouse

Accelerator

No

POWERFUL HYBRID DATABASE PLATFORMPOWERFUL HYBRID DATABASE PLATFORM

Extreme Performance Transactions Extreme Performance Analytics

Page 8: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

You can use IWA’s In-Memory Analytics to Speed Up queries on…

Page 9: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

9

Workload Overview

Page 10: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Workload: Data

lUse real data on rivers and streams from U.S. Geological Survey at www.usgs.gov : gage height and flowl15 minute interval values of 5 years from 847 measurement sites lExtend this data to span about 100 years by adding small percentages:l → >2 billion data records

10

Page 11: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Workload: Data

Example: San Joaquin River, CA

11

Satellite image: google

Page 12: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Workload: Data

lDimension table of measurement siteslFact table with >2 billion measurement valuesl(as 847 TimeSeries)lTimeSeries load data is in external table and has JSON format:{"gage_height":4.120, "flow":0.510}

12

Page 13: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Workload: Data

13

Fact table TimeSeries structure:

site_idCHAR(20)

regular TIMESERIES :

initial sensordata

BSON...

sensordata

BSON

sensordata

BSON

sensordata

BSON

sensordata

BSON

site_idCHAR(20)

regular TIMESERIES :

initial sensordata

BSON...

sensordata

BSON

sensordata

BSON

sensordata

BSON

sensordata

BSON

...

Page 14: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Workload: Queries

lJoin dimension with fact tablelTypical aggregations: AVG, MAX, MIN of measurementslGROUP BY measurement sitelORDER BYl→ Full fact table scan necessary

14

Page 15: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Workload: Query example

select d1.id, d1.name, min(f1.v1) as min_gage, max(f1.v1) as max_gage, avg(f1.v1) as avg_gage, min(f1.v2) as min_discharge, max(f1.v2) as max_discharge, avg(f1.v2) as avg_discharge from v_tstable_j f1, site_dim d1 where f1.id = d1.id group by d1.id, d1.name order by d1.id

15

Page 16: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

16

Test Results

Page 17: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Test Environment

• Intel® Xeon® E7-4890 v2 (2.8GHz, 15 cores/30 threads per CPU, 37.5MB LLC)

• 1TB DDR3 memory (1333 MHz)

• 2x Intel® 910 PCIe SSDs

• Informix 12.1

• Goals: testing scaling of workload by core count and by dataset size

Page 18: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

IWA CPU Scaling – 4GB DB

15 30 45 600

50

100

150

200

250

300

350

Query 1 - 4GB DB CPU Scaling

HTOff

HTOn

CPU Cores

Run

time

(s)

15 30 45 600

20

40

60

80

100

120

140

160

Query 2 - 4GB DB CPU Scaling

HT Off

HT On

CPU Cores

Run

time

(s)

Query 1: 15 60 core scaling (HT), 3.0xQuery 2: 15 60 core scaling (HT), 2.5x

Page 19: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

IWA CPU Scaling – 160GB DB

15 30 45 600

2000

4000

6000

8000

10000

12000

Query 1 - 160GB DB CPU Scaling

HT Off

HT On

CPU Cores

Run

time

(s)

15 30 45 600

1000

2000

3000

4000

5000

6000

Query 2 - 160GB DB CPU Scaling

HT Off

HT On

CPU Cores

Run

time

(s)

Query 1: 15 60 core scaling (HT), 3.3xQuery 2: 15 60 core scaling (HT), 3.2x

Page 20: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

20

Intel® Xeon® E7 Overview

Page 21: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Tick-Tock Development Model:Sustained Microprocessor Leadership

Intel® Core™ Microarchitecture

NewMicro-

architecture

Xeon®7300

65nm

TOCK

Xeon®7400

NewProcess

Technology

45nm

TICK

Intel® Microarchitecture

Codename Nehalem

NewMicro-

architecture

Xeon®7500

45nm

TOCK

Xeon®E7- 4800(WSM-EX)

32nm

NewProcess

Technology

TICK

Intel® MicroarchitectureCodename Sandy

Bridge

32nm

NewMicro-

architecture

TOCK

22nm

NewProcess

Technology

TICK

Intel® Microarchitecture

Codename Haswell

Haswell

22nm

NewMicro-

architecture

TOCK

Future

14nm

NewProcess

Technology

TICK

Xeon® E7- 4800 v2

(IVB-EX)

Huge Leap in Performance and Capabilities

All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.

Page 22: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Intel® Xeon® Processor E7-8800/4800/2800 v2 Product Families

Integrated PCI Express* 3.0Up to 32 lanes per socket

Up to 3 DIMMs per channel (up to 24 DDR3 1600Mhz DIMMs per socket)

Up to 4 Intel ® C102/C104Scalable Memory Buffersper socket

Up to 37.5MB Shared CacheUp to 37.5MB Shared Cache

Intel® Xeon® ProcessorE7-4800 v2

Product Family

Intel® Xeon® ProcessorE7-4800 v2

Product Family

Up to 50% more cores and up to 25% more cache for up to 2x average top-bin performance increase1

New Advanced Reliability features for improved system uptime and data integrity

Highest memory capacity for data-demanding, transaction-intensive workloads

Improved security with Intel®

Secure Key & Intel® OS Guard for additional HW embedded security

Results were derived using simulations run on an architecture simulator or model. Any difference in system hardware or software design or configuration may affect actual performance. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance

All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.

1 Results have been simulated and are provided for informational purposes only. Compared to previous generation.

Page 23: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Intel® Xeon® Processor E7 Family Significant Generational Improvements

Intel® Xeon® processor E7-8800/4800/2800 product families (code name Westmere EX)

Intel® Xeon® processor E7-8800/4800/2800 v2 product families (code name Ivy Bridge EX)

Process Technology 32nm 22nm

Cores / Threads Up to 10 / 20 per socket Up to 15 / 30 per socket

L3 Cache Size Up to 30M Up to 37.5M

MemoryCapacity

Up to 16 DIMMs per socket32GB max DIMM DensityUp to 2TB in 4S / Up to 4TB in 8S

Up to 24 DIMMs per socket64GB max DIMM DensityUp to 6TB in 4S / Up to 12TB in 8S

Max Memory Speed Up to 1066MHz Up to 1600MHz

I/O Bandwidth Up to 72 lanes PCIe* 2.0 (dual IOH) Up to 32 Integrated PCIe* 3.0 lanes per socket

Intel® QPI Bandwidth Up to 4 x 6.4 GT/s per socket Up to 3 x 8.0 GT/s per socket

RAS AdvancedPrevious Gen + eMCA Gen 1, MCA Recovery – Execution Path, MCA – IO, PCIe* LER

Platform Technologies

Intel® Turbo Boost Technology, Intel® TXT, Intel® Dynamic Power, Intel® VT-x, Intel VT-d, Intel® I/OAT/CB3 Technology, Intel® Node Manager, TPM 1.2 and more

Previous Gen + Intel® Secure Key + Intel® OS Guard + Intel® Integrated I/O + Intel® Direct Data I/O + Node Manager 2.0 + Intel® AVX + APICv

All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.

* Other names and brands may be claimed as the property of others.

Page 24: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Intel® Xeon® Processor E7-8800/4800/2800 v2 Product FamiliesScalability to Handle Any Datacenter Workload

2S

4S

8S

•••

XNC (>8S)

Intel® Xeon® Processor E7 FamilyMemory & Intel® C102/C104Scalable Memory Buffer

Intel® C602J Chipset

Intel® QuickPath Interconnect 3rd partry (OEM) Node Controller (XNC) (non-Intel)

OEM interconnect

LAN

Page 25: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Memory Platform OverviewBreakthrough Performance & Capacity

Up to 24 DIMMs per socket

Up to 4 Intel ® C102/C104Scalable Memory Buffersper socket

Up to 37.5MB Shared CacheUp to 37.5MB Shared Cache

Intel® Xeon® ProcessorE7-4800 v2

Product Family

Intel® Xeon® ProcessorE7-4800 v2

Product Family

Up to 8 DDR3 Channels

1 to 6 DIMMs per Buffer

• 8 DDR3 channels per socket• Up to 24 DDR3 DIMMs per socket• Supports up to 64GB DDR3 LR-DIMM• Up to 6TB in a 4S platform, 12TB in a 8S platform1

• 8 DDR3 channels per socket• Up to 24 DDR3 DIMMs per socket• Supports up to 64GB DDR3 LR-DIMM• Up to 6TB in a 4S platform, 12TB in a 8S platform1

Large Memory CapacityLarge Memory Capacity

• Up to 1600MHz DDR3 speeds• Intel® SMI Gen 2: Up to 2.66 GT/s• Memory Controller can support 2 modes

• Performance Mode (higher I/O, B/W)• Lockstep Mode (highest DDR3 speeds)

• Up to 1600MHz DDR3 speeds• Intel® SMI Gen 2: Up to 2.66 GT/s• Memory Controller can support 2 modes

• Performance Mode (higher I/O, B/W)• Lockstep Mode (highest DDR3 speeds)

POR Speeds, Memory Controller ModesPOR Speeds, Memory Controller Modes

• Active (Rack): Up to 9W @2.66 GT/s• Idle: Up to 2.5W

• Active (Rack): Up to 9W @2.66 GT/s• Idle: Up to 2.5W

Power (Target)Power (Target)

1 Memory capacity possible by populating all (96 for 4S; 192 for 8S) DIMMs with 64GB DDR3 LR-DIMMs

All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.

Page 26: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Conclusion

• IWA and Timeseries can provide fast in-memory analytics for noSQL data

• noSQL workload in development shows great promise in terms of taking advantage of scalability provided by Intel® Xeon® E7 platform

• Next steps: 1. Continue workload development

2. Test at scale factors

Page 27: In-Memory and TimeSeries Technology to Accelerate NoSQL Analytics

Questions?

27