computer sciencecs229.stanford.edu/proj2019spr/poster/27.pdf · making trading great again:...

Making Trading Great Again: Trump-based Stock Predictions via doc2vec Embeddings Computer Science Rifath Rashid Stanford University, Department of Computer Science Background Analysis/Conclusion Model Data Future Directions References Problem • One of the first seminal papers using Twitter to predict stock movements focused on relatively coarse categories of sentiment [1] • More recent advancements in NLP and NLU have given rise to distributed representations (word2vec, doc2vec) [2, 3] • On January 20 th 2017, Donald Trump is inaugurated as President • President Trump uses Twitter as his singular platform. His tweets and policies have been known to shake financial markets. • Can contextually-rich document representations of Trump tweets successfully predict stock change? Trump Twitter Statistics (2010-2019) Total Posts 38,471 Max posts on one day 160 Total business days with at least one post 2,139 Data ‘great’ 2,891 ‘Great’ 1,204 ‘America’ 765 ‘big’ 771 ‘Hilary’ 657 ‘GREAT’ 537 ‘Fake’ 372 Frequent Words • End of day prices obtained from Quandl • Parameter tuning performed on RUSSELL 3000 index – 3000 largest American companies Stock Price Data doc2vec Embedding reLu activation, L2 regularization Dropout (0.2) reLu activation Stock price change Results Trading Simulations Original diagram from seminal doc2vec paper by Mikolov and Le • Members of the theory community have claimed that weighted word2vec embeddings might be a better baseline than doc2vec. The financial research community has shown empirical support for doc2vec • Explore second or millisecond stock price data as twitter effects may be relevant for a short time window • Explore effects of model on more complex trading strategies [1] Bollen, J., Mao, H. & Zeng, X. (2010) Twitter mood predicts the stock market. Journal of Computer Science, pp. 1-8. [2] Lutz, Bernhard $\&$ Pröllochs, Nicolas$\&$ Neumann, Dirk. (2019). Sentence-Level Sentiment Analysis of Financial News Using Distributed Text Representations and Multi-Instance Learning. 10.24251/HICSS.2019.137. [3] Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 1188–1196, 2014. Loss vs Epoch varying Embedding Vector Size Loss vs Epoch varying doc2vec Training Epochs Compared to: 38.4% growth of DJIA • Smaller embedding vector sizes result in lower loss. Since we are considering only one person’s lexicon, larger dimensions likely add more noise. • Accuracy on up/down are generally a little over 50%. Testing data is evenly split, and predictions are also evenly split (i.e. Google up/down predictions have a 55%:45% distribution) • Companies show varying responses to tweet events (i.e. 1/1/19—new year and heavy discussion about the wall)

Upload: others

Post on 21-Aug-2020

5 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Making Trading Great Again: Trump-based Stock Predictions via doc2vec Embeddings

Computer Science

Rifath RashidStanford University, Department of Computer Science

Background Analysis/Conclusion

Model

Data

Future Directions

References

Problem

• One of the first seminal papers using Twitter to predict stock movements focused on relatively coarse categories of sentiment [1]

• More recent advancements in NLP and NLU have given rise to distributed representations (word2vec, doc2vec) [2, 3]

• On January 20th 2017, Donald Trump is inaugurated as President

• President Trump uses Twitter as his singular platform. His tweets and policies have been known to shake financial markets.

• Can contextually-rich document representations of Trump tweets successfully predict stock change?

Trump Twitter Statistics (2010-2019)Total Posts 38,471

Max posts on one day 160

Total business days with atleast one post

2,139

Data

‘great’ 2,891

‘Great’ 1,204‘America’ 765‘big’ 771‘Hilary’ 657‘GREAT’ 537‘Fake’ 372

Frequent Words• End of day prices

obtained from Quandl • Parameter tuning

performed on RUSSELL 3000 index –3000 largest American companies

Stock Price Data

doc2vec Embedding

reLu activation, L2 regularization

Dropout (0.2)

reLu activation

Stock price change

Results

Trading Simulations

Original diagram from seminal doc2vec paper by Mikolov and Le

• Members of the theory community have claimed that weighted word2vec embeddings might be a better baseline than doc2vec. The financial research community has shown empirical support for doc2vec

• Explore second or millisecond stock price data as twitter effects may be relevant for a short time window

• Explore effects of model on more complex trading strategies

[1] Bollen, J., Mao, H. & Zeng, X. (2010) Twitter mood predicts the stock market. Journal of Computer Science, pp. 1-8.

[2] Lutz, Bernhard $\&$ Pröllochs, Nicolas$\&$ Neumann, Dirk. (2019). Sentence-Level Sentiment Analysis of Financial News Using Distributed Text Representations and Multi-Instance Learning. 10.24251/HICSS.2019.137.

[3] Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 1188–1196, 2014.

Loss vs Epoch varying Embedding Vector Size

Loss vs Epoch varying doc2vec Training Epochs

Compared to: 38.4% growth of DJIA

• Smaller embedding vector sizes result in lower loss. Since we are considering only one person’s lexicon, larger dimensions likely add more noise.

• Accuracy on up/down are generally a little over 50%. Testing data is evenly split, and predictions are also evenly split (i.e. Google up/down predictions have a 55%:45% distribution)

• Companies show varying responses to tweet events (i.e. 1/1/19—new year and heavy discussion about the wall)

A BERT-based model for Multiple-Choice Reading Comprehensioncs229.stanford.edu/proj2019spr/poster/72.pdf · A BERT-based model for Multiple-Choice Reading Comprehension Kegang Xu

Generating metadata subject labels with Doc2Vec and DBPediaInspiration 1) Constant discussion of topic modeling as “solution” to improving data discovery [useful internally] 2)

Computer Engineering Computer Engineering Computer

Making Trading Great Again: Trump-based Stock …cs229.stanford.edu/proj2019spr/report/27.pdfMaking Trading Great Again: Trump-based Stock Predictions via doc2vec Embeddings Rifath

Computer Computer Characteristics of computer Characteristics of computer Limitation of computer Limitation of computer What is Computer Hardware?

Deep Learning Based Myocardial Segmentation for Free ...cs229.stanford.edu/proj2019spr/report/67.pdf · MR Cine imaging is the gold-standard to visualize cardiac motion abnormality

CS 229 Final Report StruX: Methods Comparison for Structural …cs229.stanford.edu/proj2019spr/report/13.pdf · 2019-06-18 · CS 229 Final Report StruX: Methods Comparison for Structural

Building Unbiased Comment Toxicity Classiﬁcation Model ...cs229.stanford.edu/proj2019spr/report/79.pdfwith Natural Language Processing Louise Qianying Huang, Mi Jeremy Yu Department

Audio and Text based Multimodal Sentiment …...For song classiﬁcation, SVM and Naive-Bayes classiﬁers are built using the textual features which are computed by Doc2Vec vectors

CS229 Term Final Project Reportcs229.stanford.edu/proj2019spr/report/94.pdfCS229 Term Final Project Report Hasan Unlu June 12, 2019 Controlling Two Wheels Balancing Robot Us-ing Reinforcement

Presentazione standard di PowerPoint - PyCon Nove · TF-IDF matrix word2vec matrix Cleaning & Transformation Vocabulary codification Wikipedia Classification REJECTED doc2vec matrix