final poster for engineering showcase

1
Social Media Influence and Credibility By Students and Professors of Texas A&M University Acknowledgements Our team would like to thank, Drs. Yates and Hempelmann for their guidance and contributions. In addition, The Bush School members and the State department helped in good faith to analyze past research, areas for further examination, and diverse methodologies for working with large data sets. Citations *Authors listed in alphabetical order. Future Considerations While our team focused on a very unique and non- traditional social media application, the progress made in developing user metrics has great potential in other areas. The combination of OST with statistical and network concepts is particularly unique and has significant promise as electronics communication continues to increase. This exploratory research suggests that there are many directions which future research and applications might take. For instance: Public Policy: Twitter has been used recently during tumultuous political periods to coordinate protests or even violent acts. Predicting and preparing for such events can increase awareness and safety. Business Tools: The ability to classify social media users is an invaluable tool for businesses, as it offers an accurate, cost effective way of reaching their target consumers. Psychological : Determine meaning, and even credibility, of text content. Large digital data sets can help to understand and refine perceptions and trust. Social behavior can be mapped on a scale never before imagined. Real Time Traffic: Accident detection, Crime detection, and geographic information can be tracking in real time. Abstract Background Twitter Data Extraction Our Computer Science Team extracted various kinds of data from Twitter using a Python library called Tweepy. Four major kinds of data were extracted: Searching Twitter for “mentions” of a particular user Target: What the network is saying about a user Application: Rumors and News about player injuries Pulling all of the tweets issued by a given set of Twitter authorities Target: What the user is saying Application: What a player, team , or an interested party is saying regarding injuries Searching Twitter for combinations of keywords and returning the resulting tweets Target: Identify noteworthy word combinations and usages Application: Eliminate false positives and refine word ontology Pulled entire follower data for a given set of users so as to help form a follower network Target: Identify communities of users to augment user influence and credibility scoring Application: Identify communities and find pivotal users who bridge groups This methodology has been encoded and documented to allow quick implementation of command line prompts to initiate quick data extraction. Statistical Analysis Semantic Parser In order to analyze the content of tweets, to more fully gauge influence and in particular credibility of a user, a semantic parser was developed to quickly and accurately track the most likely meanings of words in relation to the studied area. The methodology is based on Ontological Semantic Technology (OST) and can establish the relevance of a tweet’s content. It works by following the computational steps below (Figure 5): OST Knowledge Resources OnSe Lexicon, Proper Name DB OnSe InfoStore Figure 1: Past Growth of Social Media Scoring This assessment will help to identify verified users, and highly active/effective users. The elements above were used to formulate an initial linear score to measure a users influence (Figure 3). This figure shows that among 109 authoritative users selected by an informed user, a small few were vastly more influential than the majority. This score will be augmented with network analysis that will help to identify pivotal users who bridge different communities (Figure 4). This figure shows a user who merits further investigation. Validation and verification to come. Predictive identification of verified users, pivotal users, or influential users through third party statistics will be considered successful and influence future considerations. Influence is a difficult quality to quantify but many third party measures are available to us; such as Klout, Kred, and Google trends. In addition, injuries were tracked for four basketball teams during the end of the 2014 season: The Chicago Bulls, The Dallas Mavericks, The Utah Jazz, and The New Orleans Pelicans. This ground truth will be compared to content to supplement both the content and the non-content credibility verification. Figure 3: Normalized Non-Content Related Influence Score Figure 4: Pivotal Network Interconnections Accessing the meaning of text in terms of underlying concepts Goes beyond simple keywords on the surface For each content word in a sentence it: Grabs the underlying concepts for all senses of the word Sets up a matrix for all sense combinations Finds the shortest path in the ontology graph for each sense combination Declares overall shortest path the correct meaning If correct meaning uses a concept from the sports injury subdomain of ontology the tweet is declared a true positive and can be evaluated for further credibility Double-pass corpus-driven acquisition (desired and undesired senses of all relevant vocabulary) ontology of basketball and injuries are in place Language independent ontology capturing world knowledge For quick ramp-up, the lexicon is integrated into ontology Figure 5: Flow Representation of the Steps for our Ontological Semantic Technology Parser Figure 2: Parallels Between the NBA and States Industrial and Systems Engineering Thomas Gray [email protected] Tucker Truesdale [email protected] Dr. Justin Yates (ISEN) [email protected] Bush School Alex Bitter [email protected] Cheryl Landry [email protected] John Mellusi [email protected] Computer Science Zachary Habersang [email protected] Nishaanth Narayanan [email protected] Commerce Dr. Chris Hempelmann [email protected] Todd Morris [email protected]

Upload: tucker-truesdale

Post on 23-Jan-2018

67 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Final Poster for Engineering Showcase

Social Media Influence and CredibilityBy Students and Professors of Texas A&M University

AcknowledgementsOur team would like to thank, Drs. Yates and

Hempelmann for their guidance and contributions. In addition, The Bush School members and the State

department helped in good faith to analyze past research, areas for further examination, and diverse methodologies

for working with large data sets.

Citations

*Authors listed in alphabetical order.

Future ConsiderationsWhile our team focused on a very unique and non-traditional social media application, the progress made in developing user metrics has great potential in other areas. The combination of OST with statistical and network concepts is particularly unique and has significant promise as electronics communication continues to increase. This exploratory research suggests that there are many directions which future research and applications might take. For instance:

• Public Policy: Twitter has been used recently during tumultuous political periods to coordinate protests or even violent acts. Predicting and preparing for such events can increase awareness and safety.

• Business Tools: The ability to classify social media users is an invaluable tool for businesses, as it offers an accurate, cost effective way of reaching their target consumers.

• Psychological : Determine meaning, and even credibility, of text content. Large digital data sets can help to understand and refine perceptions and trust. Social behavior can be mapped on a scale never before imagined.

• Real Time Traffic: Accident detection, Crime detection, and geographic information can be tracking in real time.

Abstract

Background

Twitter Data ExtractionOur Computer Science Team extracted various kinds of data from Twitter using a Python library called Tweepy. Four major kinds of data were extracted:

Searching Twitter for “mentions” of a particular user • Target: What the network is saying about a user• Application: Rumors and News about player injuries

Pulling all of the tweets issued by a given set of Twitter authorities • Target: What the user is saying• Application: What a player, team , or an interested party is saying regarding injuries

Searching Twitter for combinations of keywords and returning the resulting tweets • Target: Identify noteworthy word combinations and usages• Application: Eliminate false positives and refine word ontology

Pulled entire follower data for a given set of users so as to help form a follower network• Target: Identify communities of users to augment user influence and credibility scoring• Application: Identify communities and find pivotal users who bridge groups

This methodology has been encoded and documented to allow quick implementation of command line prompts to initiate quick data extraction.

Statistical Analysis

••••

Semantic ParserIn order to analyze the content of tweets, to more fully gauge influence and in particular credibility of a user, a semantic parser was developed to quickly and accurately track the most likely meanings of words in relation to the studied area. The methodology is based on Ontological Semantic Technology (OST) and can establish the relevance of a tweet’s content. It works by following the computational steps below (Figure 5):

OST Knowledge Resources

OnSe Lexicon,Proper Name DB

OnSeInfoStore

Figure 1: Past Growth of Social Media

ScoringThis assessment will help to identify verified users, and highly active/effective users. The elements above were used to formulate an initial linear score to measure a users influence (Figure 3). This figure shows that among 109 authoritative users selected by an informed user, a small few were vastly more influential than the majority.

This score will be augmented with network analysis that will help to identify pivotal users who bridge different communities (Figure 4). This figure shows a user who merits further investigation.

Validation and verification to come. Predictive identification of verified users, pivotal users, or influential users through third party statistics will be considered successful and influence future considerations. Influence is a difficult quality to quantify but many third party measures are available to us; such as Klout, Kred, and Google trends. In addition, injuries were tracked for four basketball teams during the end of the 2014 season: The Chicago Bulls, The Dallas Mavericks, The Utah Jazz, and The New Orleans Pelicans. This ground truth will be compared to content to supplement both the content and the non-content credibility verification.

Figure 3: Normalized Non-Content Related Influence Score

Figure 4: Pivotal Network Interconnections

Accessing the meaning of text in terms of underlying concepts• Goes beyond simple keywords on the

surfaceFor each content word in a sentence it:• Grabs the underlying concepts for all

senses of the word• Sets up a matrix for all sense

combinations• Finds the shortest path in the

ontology graph for each sense combination

• Declares overall shortest path the correct meaning

If correct meaning uses a concept from the sports injury subdomain of ontology the tweet is declared a true positive and can be evaluated for further credibilityDouble-pass corpus-driven acquisition (desired and undesired senses of all relevant vocabulary) ontology of basketball and injuries are in placeLanguage independent ontology capturing world knowledge• For quick ramp-up, the lexicon is

integrated into ontology Figure 5: Flow Representation of the Steps for our Ontological Semantic Technology Parser

Figure 2: Parallels Between the NBA and States

Industrial and Systems EngineeringThomas Gray [email protected]

Tucker Truesdale [email protected]. Justin Yates (ISEN) [email protected]

Bush SchoolAlex Bitter [email protected] Landry [email protected] Mellusi [email protected]

Computer ScienceZachary Habersang [email protected] Narayanan [email protected]

CommerceDr. Chris Hempelmann [email protected] Morris [email protected]