mark falco oracle coherence development · infiniband - socket direct protocol • streaming...

19
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8 Achieving the performance benefits of Infiniband in Java Mark Falco Oracle Coherence Development

Upload: others

Post on 24-Mar-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Achieving the performance benefits of Infiniband in Java

Mark Falco Oracle Coherence Development

2 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

The following is intended to outline general product use and direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

3 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Exalogic / Exabus

4 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Exalogic - Hardware

•  24 cores •  96GB RAM •  30 compute nodes in a full rack • QDR Infiniband

5 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Infiniband

• High throughput (~32gbs in QDR) •  Low latency (~1us) •  Super Jumbo Frames (MTU 64KB) •  Supports standard IP stack (UDP/TCP) •  Verbs based API • Remote Direct Memory Access (RDMA)

–  pre-registered memory accessible to remote machines –  operates without involving host CPU

6 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Exabus - Exalogic I/O and Network Design Eliminates cloud, cluster and network virtualization I/O bottlenecks

Data Center Service Network (10GbE)

Management Network (GbE)

Data Center Mgmt Network (GbE)

10GbE

GbE

Ethernet Gateway Switches

Standard Oracle

Database

Exabus (InfiniB

and I/O B

ackplane)

Exadata Exalogic

SPARC SuperCluster

Management Switch Storage

Compute Nodes

Spine Switch

Exalogic X2-2

Copyright  ©  2011  Oracle  Corpora4on    

ZFS Storage

IB

7 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Exabus - Optimizations Direct Memory I/O for Java

•  New Java APIs and Exalogic Elastic Cloud Software - Low Latency Java support for Infiniband - Optimized implementation for Exalogic Infiniband

•  Surfacing low-level advanced networking capabilities

8 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Infiniband - Socket Direct Protocol

•  Streaming sockets API, i.e. SOCK_STREAM •  Easily integrated into TCP based applications •  zero-copy or kernel-bypass •  Java availability

– Proprietary in JDK6 – Standard in JDK7

9 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Infiniband - Coherence Integration

•  Initially attempted over standard UDP – Experimented with TCP/SDP

• Required many co-located nodes to utilize bandwidth – Dozens in order to max out HCA

•  Latencies –  Large objects: benefit from Infiniband without protocol change – Small objects: on-par with standard ethernet (300-600us)

10 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

MessageBus

•  Binary low-level message transport – Multi-point addressing – Reliable ordered delivery – Asynchronous event based programming model

•  Pluggable provider based framework – SocketBus (TCP/SDP) – Native RDMA Exabus

11 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Exabus - MessageBus Next-generation of Exalogic performance optimization

New for Exalogic V1.1 Exalogic V1.0

IB  Transport  APIs  MessageBus   SDP  

Coherence   WebLogic  

Tuxedo  

InfiniBand  Core  

Hardware  and  Firmware  

EoIB  

Any    Linux  or  

Solaris  App.  

TCP/IP  

IPoIB  

Na4ve  RDMA  MessageBus  

12 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

MessageBus - API public interface MessageBus {!

void setEventCollector(Collector<Event> collector);!

void open();!

void close();!

void connect(EndPoint peer);!

void disconnect(EndPoint peer);!

void release(EndPoint peer);!

void flush();!

void send(EndPoint peer, BufferSequence buf, Object receipt);!

}!

13 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

MessageBus - Events Event   Indicates  

OPEN   Start  of  bus  event  stream  

CLOSE   End  of  bus  event  stream  

CONNECT   Start  of  per-­‐connec4on  event  stream  

DISCONNECT   End  of  confirmed  delivery  per-­‐connec4on  event  stream  

RELEASE   End  of  per-­‐connec4on  event  stream  

MESSAGE   Local  message  delivery  

RECEIPT   Message  delivery  confirma4on  

BACKLOG_EXCESSIVE   Start  of  backlog  condi4on  

BACKLOG_NORMAL   End  of  backlog  condi4on  

14 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

MessageBus - Native RDMA

•  Zero-copy and kernel-bypass • Optimized for sender latency •  Predictive notifications avoid costly interrupts •  Asynchronous task based system manages protocol • Custom DirectByteBuffer

–  allows for zero-copy –  reduces GC pressure

15 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Message Transfer - Native RDMA Receiver

Message

RDMA Write Header

Collector

Sender

Collector

Delivery

Message

RDMA Write Receipt

Ring Buffer

Ring Buffer

Delivery

RDMA Read Body

Allocation

16 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

MessageBus - Coherence Integration

•  Pluggable message transport • MessageBus per service

–  Legacy system utilized a single transport for entire JVM

•  Increased Parallel Processing – Network I/O – Message Deserialization

• Message Delivery - Java context switches 1 vs. 3 – Potential for zero context switches

17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

MessageBus - Coherence Integration

Member 1

PartitionedCacheService

(Cache: D, E, F)

MessageBustmb://

192.168.1.2:8000.2

PartitionedCacheService

(Cache: A, B, C)

MessageBustmb://

192.168.1.1:8000.1

InvocationService

MessageBustmb://

192.168.1.2:8000.3

Member 2

PartitionedCacheService

(Cache: D, E, F)

MessageBustmb://

192.168.1.2:8001.2

PartitionedCacheService

(Cache: A, B, C)

MessageBustmb://

192.168.1.1:8001.1

InvocationService

MessageBustmb://

192.168.1.2:8001.3

Member 3

PartitionedCacheService

(Cache: D, E, F)

MessageBustmb://

192.168.1.2:8002.2

PartitionedCacheService

(Cache: A, B, C)

MessageBustmb://

192.168.1.1:8002.1

InvocationService

MessageBustmb://

192.168.1.2:8002.3

Exabus RDMA

18 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

MessageBus - Coherence Integration

•  The network is no longer the bottleneck • Measured Improvements

–  small number of nodes can max out HCA –  latencies reduced to ~100us RDMA Bus, ~200us SocketBus

•  Future direction – more MessageBusses per service –  prototyped solution drops latency down to 70us –  designs to drop latency to 40us

19 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Q&A