mining dynamic social networks from public news articles for company value prediction

18
Mining dynamic social networks from public news articles for company value prediction. - PRATIK, MICHEL, KAI & MINGHAO

Upload: pratik-doshi

Post on 12-Apr-2017

30 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Mining dynamic social networks from public news articles for company value prediction.

- PRATIK, MICHEL, KAI & MINGHAO

Objectives and Key notesWhat we discovered!

1. Study, analyze and understand impactful relations that exist between companies.

2. Transform the discovered relations into intercompany networks, revealing features and metrics about the company.

3. Generate models that integrate network-feature metrics as well as company financial valuations in order to substantially project or predict a company’s future value OR profit over time e.g.

Metrics like Number of company's’ a company relates with (Network feature metric), Company’s profit (financial metric).

Concepts and Techniques utilized. Network Analysis Graph theory Ranking

Machine learning Algorithms Regression ()

Statistical Methods Correlation. () Mean Squared Error.

Algebraic equations e.g the one that they used for the relation score

Choice of research domainDocument-level and sentence-level co-occurrenceThe more companies co-appear or are described together in important news articles and/or sentences, the stronger their mutual relationship.

NB: The study doesn’t extract specific relations separately but rather generalizes all co-occurrence’s as impact relations, i.e., how many impacts a company receives from others, by considering consider positive/negative structural impacts from networks.

Research CoverageFor a Target company Generation of inter-company networks entailing Local and global relations, historical relations and the delta change in impact of relations over time.Borrowing the Page ranking algorithm ideology used in Information retrieval systems. Companies are ranked by each network feature and company valuations.(e.g. Profit) Usage of machine learning algorithm such as linear regression and SVM regression to combine the features of the longitudinal network with a company’s financial information to predict the company value.

Extracting DataNew York Times

Social Network DataFrom the large scalable Public data about companies available in the news and electronically through the web. (News Articles mainly. ). Data dated from 1981 – 2009 (year by year).

e.g. IBM appeared in about 300 news articles in the New York Times in 2009 (277 articles as IBM and 84 articles as International Business Machines). Interviews, Questionnaires and Observations. Financial Data. Company valuations were also obtained from New York Times Fortune 500 List (1955 - 2009) .

Pre-processing the dataFor a Target company

For target company x, let candidate company be y (one that is impacting x in a period of time t. Sets of documents D and sentences S in which they’ve co-occurred during time t are collected.

Generating Longitudinal directed/undirected and valued/unvalued Networks over a period of years for a set of companies .

Where

a structural feature vector is generated where indicates network effects for target company x.

Evolution of Networks

Calculating Impact relation StrengthAlgorithm

= a* And - Weights computed for the total number of documents and sentences in which target company and candidate company co-occur.

= = e.g. IBM in 2009. It is apparent that Microsoft had the greatest impact on IBM in 2009. They co-occurred in 55 articles and were described together in 264 sentences. From these sentences, we can infer that they are direct competitors.

Sometimes impact isn’t obvious, SPSS and IBM are not competitors and co-occurred in only 1 article and in 3 sentences, but their relation is important because SPSS and IBM co- appeared in an article in a high-weight document (which describes only SPSS and IBM’s acquisition relation in the entire article).

Mining Longitudinal NetworkNetwork effects

Six types of network effects are considered.

1. The number of connections that target company has.

2. Distance between x and its related nodes.

3. The number of connections that the companies relating with target company have.

4. Number of connections among x’s related nodes.

5. Distance between target company’s related nodes.

6. Number of node pairs having x on the shortest path.

Mining Longitudinal Network1. Network effects generation

A set of nodes that directly or indirectly impact focal company x is generated - 3 different types of node pairs are defined, then and . Measures of degree connectivity, Eccentricity, betweeness, are computed and then standardized to the network size .

Further analysis on the Networks

Traversing the valued directed network for more patterns revealing possible impact relations.

1. Two new sub-networks are incorporated. Neighboring node sets which are considered to exert an impact on to x through their direct connection to . NB: - shows degree to which companies are directly related to x rather than indirectly.

2. Retaining only arcs (directed edges) to reveal who is impacting who

3. Step 1(Network effects generation – (prev page)) is repeated to obtain historical network effects.

Network Feature Selection Filtering out companies with maximum Impact

Individual feature selection. Companies are ranked by network features and by their valuations (profit). – Rank vector of companies ranked by network feature Y – Companies ranked by their valuations like profit. Spearman’s rank correlation is calculated between and Y. The salient implication is that if there is an increase in the ratio of the number of connections that a company has with the numbers of connections that its neighbors have, then the value of its profits will increase.

Prediction ModelNetwork effects + Company valuations

Longitudinal network effects as well as valuations of each target company x are integrated into Linear regression model (LRM) – Predicting a company’s current or future financial value.Support vector regression model (SVR) – To learn Parameters.

Experimental results. 20 Fortune companies’ are selected as a sample. Their valuation records i.e. profits are captured and networks are generated.

First, they calculate the mean profit value of the companies, then after train their model on the records for records that span each five years networks, then after test it to predict the next five years profits then they’re compared.

This is repeated for just a company.

Performance EvaluationPrediction of the mean profits of 20 companies

DiscoveredNetwork features do not seem to contribute to revenue prediction but rather contribute to predicting companies’ profit.Company profit prediction by joint network and financial analysis outperforms network- only by 150% and financial-only by 34%.

Performance EvaluationPrediction of the mean profits of IBM and INTEL

Aspects of Network science in paper.

Graph-theory : such as degree of connectivity, diameter, shortest path used to calculate network effects

Developing models to understand the network Extracting data from NYT , Problem Statement part of Paper.

Building models to anticipate the evolution of the networks. Network effects, company valuations

Constructing models to optimise the outcomes of networksExperimental results and improvements.

What else can be done.Improvements

1. A company's value (or performance) may encompass several factors depending on the context in which it’s defined. Such as

Market performance, and Employee satisfaction and Responsibility. Analysis into these aforementioned areas can potentially improve the model’s performance.

2. More social network data resources can be used. e.g. social media especially Twitter. e.g. Twitter analysis or Facebook analysis to get the longitudinal

social network data.

3. Categorizing relations as negative or positive using sentiment analysis. Separately handling networks i.e. positive impact relations networks handled on their own as well as negative impact relations networks.