1 machine learning for stock selection robert j. yan charles x. ling university of western ontario,...

19
1 Machine Learning for Stock Machine Learning for Stock Selection Selection Robert J. Yan Robert J. Yan Charles X. Ling Charles X. Ling University of Western Ontario, University of Western Ontario, Canada Canada {jyan, cling}@csd.uwo.ca {jyan, cling}@csd.uwo.ca

Upload: shanon-miller

Post on 31-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

1

Machine Learning for Stock SelectionMachine Learning for Stock Selection

Robert J. YanRobert J. YanCharles X. LingCharles X. Ling

University of Western Ontario, CanadaUniversity of Western Ontario, Canada{jyan, cling}@csd.uwo.ca{jyan, cling}@csd.uwo.ca

2

OutlineOutline

IntroductionThe stock selection taskThe Prototype Ranking methodExperimental resultsConclusions

3

IntroductionIntroduction

Objective: – Use machine learning to select a small number

of “good” stocks to form a portfolio

Research questions:– Learning in the noisy dataset– Learning in the imbalanced dataset

Our solution: Prototype Ranking– A specially designed machine learning method

4

OutlineOutline

IntroductionThe stock selection taskThe Prototype Ranking methodExperimental resultsConclusions

5

Stock Selection TaskStock Selection TaskGiven information prior to week t, predict

performance of stocks of week t– Training set

Predictor 1 Predictor 2 Predictor 3 Goal

Stock ID Return of week t-1

Return of week t-2

Volume ratio of t-2/t-1

Return of week t

Learning a ranking function to rank testing data– Select n highest to buy, n lowest to short-sell

6

OutlineOutline

IntroductionThe stock selection taskThe Prototype Ranking methodExperimental resultsConclusions

7

Prototype RankingPrototype Ranking

Prototype Ranking (PR): special machine learning for noisy and imbalanced stock data

The PR SystemStep 1. Find good “prototypes” in training dataStep 2. Use k-NN on prototypes to rank test data

8

Step 1: Finding PrototypesStep 1: Finding Prototypes

Prototypes: representative points– Goal: discover the underlying

density/clusters of the training samples by distributing prototypes in sample space

– Reduce data sizeprototypes

prototype neighborhood

samples

10

Finding prototypes using competitive learning

General competitive learning Step 1: Randomly initialize a set of prototypes Step 2: Search the nearest prototypes Step 3: Adjust the prototypes Step 4: Output the prototypes

Hidden density in training is reflected in prototypes

11

Modifications for Stock dataModifications for Stock data

In step 1: Initial prototypes organized in a tree-structure– Fast nearest prototype searching

In step 2: Searching prototypes in the predictor space– Better learning effect for the prediction tasks

In step 3: Adjusting prototypes in the goal attribute space– Better learning effect in the imbalanced stock data

In step 4, prune the prototype tree– Prune children prototypes if they are similar to the parent– Combine leaf prototypes to form the final prototypes

12

Step 2: Predicting Test DataStep 2: Predicting Test Data

The weighted average of k nearest prototypesOnline update the model with new data

13

OutlineOutline

IntroductionThe stock selection taskThe Prototype Ranking methodExperimental resultsConclusions

14

DataData

CRSP daily stock database– 300 NYSE and AMEX stocks, largest market cap– From 1962 to 2004

15

Testing PRTesting PR

Experiment 1: Larger portfolio, lower average return, lower risk – diversification

Experiment 2: is PR better than Cooper’s method?

16

Results of Experiment 1Results of Experiment 1

00. 20. 40. 60. 8

11. 21. 41. 61. 8

0 10 20 30 40 50 60 70 80 90 100 110Stock Number i n Port f ol i o

Wee

kly

Ave

rage

Ret

urn

(%)

2

2. 5

3

3. 5

4

4. 5

5

0 10 20 30 40 50 60 70 80 90 100 110

Stock Number i n Por t f ol i o

Weekly

Std

.(% )

Average Return(1978-2004)

Risk (std)(1978-2004)

17

Experiment 2: Comparison to Experiment 2: Comparison to Cooper’s methodCooper’s method

Cooper’s method (CP): A traditional non-ML method for stock selection…

Compare PR and CP in 10-stock portfolios

18

Results of Experiment 2 Results of Experiment 2 Measures:

Average Return (Ret.)

Sharpe Ratio (SR): a risk-adjusted return: SR= Ret. / Std.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Ret.(%) SR

PR 10-stock portfolio

CP 10-stock portfolio

20

OutlineOutline

IntroductionThe stock selection taskThe Prototype Ranking methodExperimental resultsConclusions

21

ConclusionsConclusionsPR: modified competitive learning and k-NN

for noisy and imbalanced stock dataPR does well in stock selection

– Larger portfolio, lower return, lower risk– PR outperforms the non-ML method CP

Future work: use it to invest and make money!