comparing performance of distributed computing platforms in

52
Comparing Performance of Distributed Computing Platforms in Backtesting FINRA's Limit Up/Down Rules An InformationWeek Financial Services Webcast Sponsored by

Upload: others

Post on 12-Sep-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparing Performance of Distributed Computing Platforms in

Comparing Performance of Distributed

Computing Platforms in Backtesting

FINRA's Limit Up/Down Rules

An InformationWeek Financial Services Webcast

Sponsored by

Page 2: Comparing Performance of Distributed Computing Platforms in

Webcast Logistics

Page 3: Comparing Performance of Distributed Computing Platforms in

Today’s Presenters

Michael Kane,

Associate Research Scientist,

Yale Center for Analytical Sciences

Casey King,

Executive Director,

Yale Center for Analytical Sciences

Page 4: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

IBM Netezza Analytics Appliance + Revolution R

Enterprise vs. the Cloud and R Comparing Performance of Distributed Computing Platforms using Applications in

Backtesting FINRA’s Limit Up / Down Rules

Page 5: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

5

“One of the Most Terrifying Moments in Wall Street History. . .”

“A bad day in the stock market turned into one of the most terrifying moments in Wall Street history. . .It lasted just 16 minutes but left Wall Street experts and ordinary investors alike struggling to come to grips with what had happened -- and fearful of where the markets might go from here.” (New York Times, May 7, 2010)

Page 6: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Flash Crash May 6, 2010

Trader Steven Rickard reacts in the S&P 500 futures pit at the CME Group in Chicago

near the close of trading on Thursday, May 6, 2010. The stock market that day had

one its most turbulent sessions ever, with the Dow Jones Industrial Average plunging

nearly 1,000 points in a half-hour before recovering two-thirds of its losses. (AP)

6

Page 7: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Agenda

• Flash Crash • What was the Flash Crash? • What was the reaction from the market and policy makers? • What is the best way to evaluate SEC policy?

• Backtesting • How do you go about backtesting? • What are the challenges? • What is the model for backtesting SEC policy? • Three technological approaches for evaluating SEC policy:

• Workstation + R • Cloud + R • IBM Netezza + Revolution R Enterprise

• Wrap-up • Conclusions • Resources • Q&A

7

Page 8: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

8

Page 9: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Volatility Spurs Market Fear

Governing

Board

Assesses

SEC institutes Circuit Breaker

Rules

SEC Institutes Circuit Breaker Rules

9

Page 10: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

SEC Approves New Stock-by-Stock Circuit Breaker Rules

FOR IMMEDIATE RELEASE

2010-98

Washington, D.C., June 10, 2010 — The Securities and Exchange

Commission today approved rules that will require the exchanges and

FINRA to pause trading in certain individual stocks if the price

moves 10 percent or more in a five-minute period.

10

Page 11: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Rational Behind Circuit Breakers

Goal

To control volatility during extreme trading conditions

Halt

Stop trading in the event of extreme swings in stock price

Intervention

Gives human traders time to intervene

11

Page 12: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Market Response to SEC Policy

12

“ What the S.E.C. has recommended is working. Had they done this two months ago, there never would have been a Flash Crash.

- Patrick J. Healy, Issuer Advisory Group ” Advisor to public companies on how and were to list their shares for trading

Page 13: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Evaluating Volatility Rules

The Circuit Breaker rules were created based on the opinion and

experience of experts

These rules are evaluated through pilot programs

Can disrupt “normal” market behavior

May not be tested in extreme volatility

Should we be evaluating these rules with live markets?

13

Page 14: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

A Call to Action

Anecdotal Evidence, Policy Makers’

“Opinions” Should be Considered

Insufficient to Determine SEC Policy

Policy decisions must be data driven.

14

Page 15: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Alternative: Utilize Market Data to Make Data-Driven Policy Decisions

Trade data has been collected for decades now

Records of the exchange of trillions of individual stocks

Provides insight into market behavior over a wide variety of

conditions

Used by hedge-funds and banks to evaluate and calibrate trading

strategies

“Standard practice” in financial service industry

15

Page 16: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Proposal: Backtest Rules for Controlling Volatility

Evaluate the rules based on historical data

Ensures bad rules don't negatively affect market behavior

Provides a quantitative approach for evaluating market policy

More efficient mechanism for evaluating and refining rules

16

Page 17: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Polling Question 1

• Does your organization use backtesting as part of its standard operating

procedure as a part of testing analytical models?

• A. yes

• B. no

• C. I don’t know

17

Page 18: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Circuit Breakers “Exposed”: The illusion of safety is often more dangerous than the surety of risk

Conclusion: The rules are not effective in stopping catastrophic

events like the Flash Crash

We also showed that circuit breakers tend to trigger during normal

market conditions

Was this more a symbolic rather than substantive regulation

measure in the face of intense political pressure?

Should circuit breaker rules be modified to address broader market

volatility?

18

Page 19: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

SEC Announces Filing of Limit Up-Limit Down Proposal to Address Extraordinary Market Volatility

FOR IMMEDIATE RELEASE 2011-84

Washington, D.C., April 5, 2011 – The Securities and Exchange Commission today announced that national securities exchanges and the Financial Industry Regulatory

Authority (FINRA) today filed a proposal to establish a new “limit up-limit down” mechanism to address extraordinary market volatility in U.S. equity markets.

19

Page 20: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Limit Up/Down

• What is Limit Up/Down and how will the SEC evaluate whether it’s a

proposal worth making policy?

• First Challenge: Getting a clear articulation of exactly what the rules are.

20

Page 21: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

• The proposed “Limit Up-Limit Down” mechanism would prevent trades in

listed equity securities from occurring outside of a specified price band,

which would be set at a percentage level above and below the average

price of the security over the immediately preceding five-minute period.

For stocks currently subject to the circuit breaker pilot, the percentage

would be 5 percent, and for those not subject to the pilot, the percentage

would be 10 percent.

• The percentage bands would be doubled during the opening and closing

periods, and broader price bands would apply to stocks priced below

$1.00. To accommodate more fundamental price moves, there would be a

five-minute trading pause – similar to the pause triggered by the current

circuit breakers – if trading is unable to occur within the price band for

more than 15 seconds.

Limit Up/Down Requirements Per the SEC Website

21

Page 22: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Questions Unanswered By SEC Website

1. What is the average price? Are they evaluated by the bid, the

ask, the trades executed?

2. Is the 5 minute window a sliding window, or is it contiguous

blocks?

3. If the window slides, by what increments does it slide?

(millisecond, second, minute?)

4. Is the average a volume weighted average, or simple

average?

It’s challenging to build a model to evaluate the rules given this

much uncertainly and fluidity. Requires flexible analytic platform.

22

Page 23: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

The SEC Response:

Dear Dr. King:

Thank you for your message. In consultation with the Division experts who have been working on this matter, we would like to refer you to certain releases that may provide specific information for you. In particular, you may wish to review FINRA and other SRO rules concerning circuit breakers, including the rules of the NYSE and Nasdaq, among others. The rule books of the exchanges and FINRA are available on the respective SRO websites, and most have a search mechanism, which can be helpful. In addition, there are publicly available SEC orders relating to approved SRO rule filings on this matter. The link to the SRO rule-filing page on the SEC’s website is at http://www.sec.gov/rules/sro.shtml. The individual SROs may also have parallel postings on their pages, as well as their SEC filing history and submissions relating to their rule proposals. The initial SEC approval order for this matter was issued in June 2010 and can be found at http://www.sec.gov/rules/sro/bats/2010/34-62252.pdf. This order may provide useful background information and further references for your research. In addition, an order expanding the list of securities covered by the pilot was issued in September 2010 and can be found at http://www.sec.gov/rules/sro/bats/2010/34-62884.pdf. We hope you find this helpful. Please let us know if you have additional questions.

Sincerely,

Marie Ito

Senior Special Counsel

Division of Trading and Markets

23

Page 24: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Limit Up, Limit Down

Policy Goal: To mitigate market volatility.

Scope: All stocks and EFT’s traded on the US equity markets. Does not

apply to first and last 15 minutes of trading.

Method (In General Terms): Setting an acceptable range on both the

upside and downside, or “price band” within a specific time.

24

Page 25: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Limit Up, Limit Down

• Q: What about for stocks that are not S&P, Russell 1,000 or EFTs?

• A1: Stocks not listed on S&P, Russell or ETF’s, will have a plus/minus

band of 10%, provided that they do not trade for less than $1.

• A2: Stocks that trade for less than $1 are subject to a 75% plus/minus

band (based on previous day close)

25

Page 26: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Limit Up, Limit Down

If price bands are exceeded, the stock may or may not stop trading.

1. Pause: This is like a 15 second “probation.” If an additional trade is

executed within the bands, the stock continues to trade without a

stop. But if trades continue to exceed the bands, or if no trade brings

the stock back within the bands, then a stop is issued.

• Rationale: Markets don’t want stocks to stop trading because of “fat fingers,” or

some anomaly.

2. Halt: If no trade is executed that brings the stock back within the

acceptable bands within 15 seconds, trading on the stock stops for 5

minutes.

26

Page 27: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Example: May 6, 2010 Apple (AAPL)

• At 2:41 P.M., EDT on the day of “flash crash” Apple was trading at

$239.96

• 2:45.37 P.M. EDT, a trade on AAPL was executed at $225.10. If limit

up, limit down had been in place, this would have triggered a pause.

• 2:45.37, another trade was executed at 229.50 cents. Therefore,

there would be no “halt” and trading would have continued.

• 2:45.38, a trade on AAPL is executed at 225.00

• 2:45.38, a trade on AAPL is executed at 227.48

• 2:45.39, AAPL at 225. No trade is executed within 5% of average of

last 5 minutes within the following 15 seconds. Therefore a halt would

have been issued.

27

Page 28: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

AAPL continues to decline in value. The lowest trade of the day is at 199.25. It opened that day 253.83. A loss in value of 21.5 percent.

200

210

220

230

240

250

Time (05/06/10)

Sto

ck P

rice

14:37:00 14:39:00 14:41:00 14:43:00 14:45:00 14:47:00

28

Page 29: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

BACKTESTING

29

Page 30: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Our Research

• Run the FINRA rules on historic stock data (backtest)

• Determine if the circuit breaker rules are effective for controlling volatility

during catastrophic events like the Flash Crash

• Perform backtest calculations in a timely manner

30

Page 31: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

31

Inability to course correct

Time consuming

High total cost Inefficient processing

What are the Challenges?

Page 32: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Three Technological Approaches

Workstation Cloud IBM Netezza

• Created model using

open source R and

Revolution Analytics’

parallel packages

• Circuit breaker

calculations only

• Added 3rd Year

• Added calculations

for limit up/down

• Had to move data to

computation

• Revolution R

Enterprise

• Moved computation

to data

• No security issues

• Ability to manage

data within Netezza

• Scale to multi-

terabyte range and

thousands of

variables

• Quickly adapt to

evolving market

conditions

32

Page 33: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Polling Question #2

Which of the following best completes the sentence for your organization?

When building a new model for backtesting purposes:

A. most of the development time is spent implementing the business logic

B. most of the development time is spent optimizing the model to improve

performance

33

Page 34: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Processing 24 Billion Transactions

Breaking up the data

754 Files

Approximately 7800 symbols for each day

Approximately 3800 trades per symbol

An embarrassingly parallel problem!

Days can be processed independently

For a given day, symbols can be processed independently

34

Page 35: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

The Approach

• Retrieve data for a single day

• Within that day retrieve data all transactions for a given symbol

• Return 5 minute windows of trade data

• Windows are passed to a function that detects limit up/down and halt

conditions

35

Page 36: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

The Data

SYMBOL,DATE,TIME,PRICE,SIZE

A,20101029,9:30:00,34.88,37

A,20101029,9:30:11,34.86,100

A,20101029,9:30:11,34.82,200

A,20101029,9:30:24,34.82,200

A,20101029,9:30:24,34.82,100

A,20101029,9:30:24,34.82,100

A,20101029,9:30:24,34.82,100

A,20101029,9:30:27,34.8496,209

A,20101029,9:30:27,34.82,1700

36

Page 37: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

The Tools: foreach and iterators

The “iterators” package (Steve Weston, Revolution Analytics) allow a

programmer to define how a program traverses through a data set

Separates the data extraction from the data source

Easily add new sources to a given analysis

The “foreach” package (Steve Weston, Revolution Analytics) provides a

platform independent method for defining embarrassingly parallel loops

Single process, multiple cores on a single machine, or distributed

across a cluster

Packages are provided to exploit different parallel mechanisms

37

Page 38: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

The Implementation

foreach (file in taqFiles) %dopar% {

taqData <- read.csv(file)

symbolIndexList <- split(1:nrow(taqData), x$symbol)

foreach(inds in symbolIndexList) %dopar% {

findLimitUpDown(taqData[inds,])

}

}

findLimitUpDown <- function(taqSymbolDayData) {

foreach(w=time.window.iter(taqSymbolDayData)) %do% {

if (limitUpDownInWindow(w))

writeLimitUpDownInfo(w)

}

}

38

Page 39: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Workstation

• Where most development is done

• Used the “doMC” package (Steve Weston, Revolution Analytics) which

provides a link between "foreach" and a parallel programming backend --

in this case, the “multicore” package (from Simon Urbanek)

• foreach and iterators packages minimize the code changes required to

move to a distributed environment

• After the analysis is tested, can be moved to the cloud or IBM Netezza for

better performance

• Most policy reviews only allow 21 days to assess – not possible on a work

station. Time is of the essence! Need scale and performance.

39

Page 40: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Polling Question # 3

In your day-to-day operation, what is the typical time required to backtest a

model or strategy?

A. intra-day

B. overnight

C. weekly

D. quarterly

E. time isn’t a consideration

40

Page 41: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

The Cloud

• Used the doRedis package (Bryan Lewis) which uses the Redis key-value

store to provide distributed computing capabilities

• Calculations can be made to run faster simply by adding more machines

to a cluster

• Machines can be added as a calculation is being performed (dynamic

scalability)

41

Page 42: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

• Objective: Compare backtesting performance between IBM Netezza and

the Cloud

• Model Migration: Minimize refactoring (less than one hour of work)

• Data Ingestion: two hours to load three years of TAQ trade data

• Porting: Replaced reading of compressed files with data streaming that

reads partitioned data by stock symbol and trade date

IBM Netezza

42

Page 43: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

The Results

• IBM Netezza + Revolution Analytics performed 43% percent faster than the cloud – with no tuning of the analytic model

• Very quick to model and load the data in IBM Netezza architecture • Moving the analytics next to the data saves significant time • Business logic of the R code remained intact • Speed at which data can be interrogated allows users to play with many

models • IBM Netezza is much easier to set up than a cloud infrastructure

• Plug IBM Netezza in, connect to network, load data and run queries

Cloud IBM Netezza

TwinFin – 24

Nodes 60 CPU/240 Core 48 CPU/184 Core

Memory 900 GB 384 GB

Observations (Rows) 24.9 Billion 24.9 Billion

Variables (Columns) 6 6

Time to Ingest Data 24 hours 2 hours

Model Execution Time 108 hours 96 hours

Normalized Time (by Core) 108 hours 73.6 hours

Elapsed Time 132 hours 75.6 hours

43

Page 44: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management Information Management

Purpose-built analytics engine

Integrated database, server and storage

Standard interfaces

Low total cost of ownership

Speed: 10-100x faster than traditional system

Simplicity: Minimal administration and tuning

Scalability: Peta-scale user data capacity

Smart: High-performance advanced analytics

IBM Netezza Data

Warehouse Appliance

The true data warehousing appliance.

44

Page 45: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Exploiting In-Database Analytics with Revolution R Enterprise

45

LARGE DATA SET

LARGE DATA SET

Host

S-Blades™ Disk Enclosures

R Client

Results

Results

Results

LARGE DATA SET

crunching…

crunching… crunching… Results

Results

IBM Netezza

data warehouse appliance

Analytics

Analytics

Analytics

Page 46: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Exploiting In-Database Analytics with Revolution R Enterprise

46

LARGE DATA SET

LARGE DATA SET

Host

S-Blades™ Disk Enclosures

R Client

LARGE DATA SET

IBM Netezza

data warehouse appliance

Analytics

Analytics

Analytics

Page 47: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

47

Eclipse Client

Plug-in

IBM SPSS

Modeler 3rd Party Packages

Revolution

Analytics SAS

Client

IBM Netezza

appliance

IBM Netezza Analytics

IBM Netezza AMPP™ Platform

Revolution

Analytics Spatial

Hadoop

MR Matrix

IBM SPSS In-Database

Analytics

Data Prep

Predictive Analytics

Data Mining

3rd Party In-Database

Analytics

SAS

Fuzzy Logix

Software Development Kit (SDK)

User-Defined Extensions (UDF, UDA, UDFT, UDAP)

Language Support & Adaptors (R, Hadoop MR, Java, C, C++, Python, Fortran)

Development Environment

SQL Pushdown

Page 48: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

CONCLUSIONS

48

Page 49: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Summary

• IBM Netezza + Revolution Analytics Advantage

• Performance

• Value

• Simplicity

• Future Optimizations

• Refactor business logic to allow data partitioning for previous/forward

aggregation inside database

• Minimize memory management in R

49

Page 50: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

Conclusions

• Although Limit Up/Down rules are an improvement over circuit breaker

rules to mitigate market volatility, these rules require further refinement.

• Policy should be data-driven rather than opinion-driven.

• The illusion of safety is often more dangerous than the surety of risk.

• Additional research required

50

Page 51: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

QUESTIONS?

51

Page 52: Comparing Performance of Distributed Computing Platforms in

© 2011 IBM Corporation

Information Management

& Revolution

Analytics

www.netezza.com/testdrive