s3125 high-speed financial xml message processing system

19
© Hitachi Solutions, Ltd. 2013. All rights reserved. 0 Tetsuya Uemura March 19, 2013 High-speed Financial XML Message Processing System Accelerated by Massively Parallel Technologies Hitachi Solutions, Ltd. S3125

Upload: others

Post on 12-Sep-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved. 0

Tetsuya Uemura

March 19, 2013

High-speed Financial XML Message Processing System Accelerated by Massively Parallel Technologies

Hitachi Solutions, Ltd.

S3125

Page 2: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved. 1

About us

September 21, 1970

15,724

Founded

Number of employees

$ 3.0 billion Net Sales

Our Solutions and Business areas

Consulting System Development Systems Operation and Maintenance

Provision of Products and Services

Task-specific Solutions

Industry-specific Solutions

Financial Affairs / Accounting ... ERP・CRM Workflow ...

Banking Services ... Public Services Government Services ...

Hitachi Solutions, Ltd. Company Name

Other Solutions

Page 3: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved.

Today’s Topic

2

1. Why we should apply GPGPU to data processing?

2. GPGPU framework for business applications.

3. GPGPU can process financial XML messages more

than 100 times faster than CPU.

Page 4: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved. 3

1. Why we should apply GPGPU to data processing?

2. GPGPU framework for business applications.

3. GPGPU can process financial messages more than

100 times faster than CPU.

Today’s Topic

Page 5: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved.

1.1 Why we should apply GPGPU to Data Processing?

4

Elimination of a hot spot is very important in business applications such as data and transaction processing. ⇒ Server Clustering is a traditional method but not cost-

effective. ⇒ If offloading heavy processing, or hot spot on to GPU, it is a

improving performance and cost-effective solution. GPU Server Server

AP1 AP2 AP3

OS : Windows/Linux

hardware

software

Hot Spot

AP1 AP2 AP3

OS : Windows/Linux

Offload

GPGPU FW

Massively Parallel Library

CPU CPU

Page 6: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved. 5

1.2 Challenges for applying GPGPU to Data Processing

In order to apply GPGPU to data Processing, we have to optimise not only inside GPGPU but also the whole system including I/O to improve performance.

PCIe Data Transfer

Server

CPU GPU

Client PC

Parallel Processing

TCP/IP Data Transfer

Cores

Page 7: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved.

1.3 Performance Limitations

6

Data Path Bandwidth

Network 1 Gbps (GbE)

PCI Express 16 GB/s (Gen2 x16)

Main Memory 25 GB/s (DDR3 SDRAM)

VRAM 200 GB/s (inside GPU)

Bandwidth limitations

Other limitations System initialization cost: GPU Transaction cost: Network, PCIe, GPU

Page 8: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved. 7

1. Why we should apply GPGPU to data processing?

2. GPGPU framework for business applications.

3. GPGPU can process financial messages more than

100 times faster than CPU.

Today’s Topic

Page 9: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved.

2.1 Why we develop GPGPU Framework?

8

Invoke GPU from legacy or vm-based applications such as COBOL and Java.

Data Transfer between CPU and GPU

GPGPU Framework provides these I/F and flow control mechanisms.

• TCP/IP Interface between

Applications and GPU Server

• Flow Control Mechanism for

TCP/IP

• Flow Control Mechanism for PCIe

① TCP/IP Bottleneck ②PCIe Bottleneck

Solution Solution

Page 10: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved.

2.2 GPGPU Framework

9

Server

CPU GPU

Client PC

Data Base

Cores Flow

Control

Flow Control

Optimize Data Size

GPGPU Framework

Page 11: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved. 10

1. Why we should apply GPGPU to data processing?

2. GPGPU framework for business applications.

3. GPGPU can process financial messages more than

100 times faster than CPU.

Today’s Topic

Page 12: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved. 11

3.1 Message Standards In Financial Services Industry

XML-based financial services message standard, ISO 20022, is growing in the financial industry.

Fixed-length messages, however, are widely used in banking core systems so that message conversions from XML to fixed-length messages are inevitable.

The size of XML is big, and CPU takes long time to process XML. We accelerate the conversions by the power of GPGPU massively parallel processing.

XML message <?xml version="1.0" encoding="UTF-8" ?> <Document xmlns="urn:iso:std:iso:20022:..."> <CstmrCdtTrfInitn> <GrpHdr> <MsgId>ABC/1234</MsgId> <CreDtTm>2012-09-28</CreDtTm>

Fixed-length message ABC/1234 2012-09-28

Data volume extends more than 10 times due to tags and indents.

<?xml version="1.0"?><Document>

<Test1>0001</Test1><Test2>0002</Test2><Test3>0003</Test3>

</Document>CPU

<?xml version="1.0"?><Document>

<Test1>0001</Test1><Test2>0002</Test2><Test3>0003</Test3>

</Document>

GPUMany GPU cores process XML in parallel so that it can accelerate the processing.

CPU processes XML sequentially so that it takes long time.

Page 13: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved.

3.2 Experimental environment

12

CPU: AMD Phenom(tm) II X6 1090T Processor

GPU: GeForce GTX 580

Cores 6

Basic Clocks (GHz) 3.2

Memory (GB) 8

Cores 512

Basic Clocks (MHz) 772

Memory (GB) 1.5

B/W (GB/sec) 192.4

PCIe Gen2

Page 14: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved.

3.3 Processing flow of GPGPU XML Processing

13

Multi Level flow controls, TCP/IP and PCIe, improve GPGPU XML processing performance.

Step1 Step2 Step3 Step4 Step5 Step6 Step7

Transfer XML

Transfer CSV

Initialize GPU

Transfer to GPU

Process XML

Transfer to CPU

Create CSV

Server GPGPU

Client PC

Step 4

Step 2 Step 3

Step 1

Step 5

Step 6

Flow Control Flow

Control

Step 7

Cores

Optimize Data Size

CPU

Page 15: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved.

3.4 Processing Acceleration Ratio

14

0

20

40

60

80

100

120

140

Processing Acceleration Ratio (data size, acceleration ratio)

The bigger data size gets the better acceleration ratio. GPGPU XML processing is superior at any data size.

Not so bat at small data size. Faster than CPU. CPU cases process XML by

using Xerces2 Java Parser.

Saturated at large data size, because processing time is proportional to the size of data.

Accelerated ratio = (CPU case time / GPU case time)

Page 16: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved.

3.5 Throughput

15

0

20

40

60

80

100

120

140

160

180

Throughput (data size, MB/s)

The bigger data size gets the better throughput, which is saturated due to network bandwidth.

GeForce GTX 580 has enough power to exhaust the bandwidth of GbE.

GbE bandwidth

TCP/IP bandwidth limit (loopback I/F)

Page 17: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved.

3.5 Processing Time Breakdown

16

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

4KB

16KB

64KB

256KB

1MB

4MB

16MB

64MB

Processing Time Breakdown (normalized time (s/MB), data size)

Step1 Step2 Step3 Step4 Step5 Step6 Step7 Transfer XML

Transfer CSV

Initialize GPU

Transfer to GPU

Process XML

Transfer to CPU

Create CSV

The proportion of GPU initialization and GPGPU XML processing is so large that the data size should be large enough to dilute them.

GPU Initialization

GPU XML Processing

The costs of GPU initialization and processing are diluted at large data area.

Page 18: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved.

Summary

17

GPGPU is a good solution for data processing.

Data size optimization and flow control are a key to get better performance in GPGPU data processing.

Total optimization is necessary to accelerate business applications using GPGPU.

Future Works

Continuous evaluations and optimizations are needed because the most efficient data size will vary with hardware evolution.

PCIe bandwidth: gen2 ⇒ gen3

Network bandwidth: GbE ⇒ 10GbE or Infiniband

Number of GPU cores: 500 ⇒ 2500

Page 19: S3125 High-speed Financial XML Message Processing System

© Hitachi Solutions, Ltd. 2013. All rights reserved.

Thanks

18

Contact

[email protected]

4-12-7 Higashishinagawa, Shinagawa-ku, Tokyo, Japan

http://www.hitachi-solutions.com/

Tetsuya Uemura

My colleagues are waiting for you at the poster session: P0233: High-speed Financial XML Message Processing System Accelerated by Massively Parallel Technologies