big data mining in twitter using python clients
TRANSCRIPT
![Page 1: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/1.jpg)
University of New Orleans University of New Orleans
ScholarWorks@UNO ScholarWorks@UNO
Innovate UNO InnovateUNO Fall 2017
Big Data Mining in Twitter using Python Clients Big Data Mining in Twitter using Python Clients
Sanjiv Pradhanang University of New Orleans
Follow this and additional works at: https://scholarworks.uno.edu/innovate
Pradhanang, Sanjiv, "Big Data Mining in Twitter using Python Clients" (2017). Innovate UNO. 5. https://scholarworks.uno.edu/innovate/Fall2017/oral/5
This Oral Presentation is brought to you for free and open access by the Undergraduate Showcase at ScholarWorks@UNO. It has been accepted for inclusion in Innovate UNO by an authorized administrator of ScholarWorks@UNO. For more information, please contact [email protected].
![Page 2: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/2.jpg)
Big Data Mining in Twitter using
Python clientsSANJIV PRADHANANG
MENTOR: DR. SHAIKH ARIFUZZAMAN
![Page 3: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/3.jpg)
GOALS:
Assist in making better decisions
Discover patterns in data sets
Evaluation by natural language processing
Image courtesy of CountryLiving
![Page 4: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/4.jpg)
METHODOLOGY
ExtractionClassificationComparison & Inference
![Page 5: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/5.jpg)
![Page 6: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/6.jpg)
EXTRACTION
Data first extracted in JSON format
![Page 7: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/7.jpg)
EXTRACTION
Streaming APIs for perpetual data
![Page 8: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/8.jpg)
EXTRACTION
Appropriate filters as requiredStaying within the limits of API
![Page 9: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/9.jpg)
![Page 10: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/10.jpg)
METHODOLOGY
ExtractionClassificationComparison & Inference
![Page 11: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/11.jpg)
CLASSIFICATION
Labeling of tweets:Web services:
Amazon Mechanical Turk (AMT)
Set specific conditions
![Page 12: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/12.jpg)
CLASSIFICATION
Classifiers:Naïve Bayes Support Vector Machine (SVM)Maximum Entropy Classifier
![Page 13: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/13.jpg)
NAÏVE BAYES CLASSIFIER
![Page 14: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/14.jpg)
SUPPORT VECTOR MACHINE (SVM)
Trains from a set of labeled dataAn SVM builds an algorithm from trainingUses it to categorize new test casesWorks best as a binary classifier
![Page 15: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/15.jpg)
MAXIMUM ENTROPY CLASSIFIER
Features of data sets are weightedUniform probability distribution favoredBetter when prior data is unknownModel that converts the contextual
info into class prediction
![Page 16: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/16.jpg)
METHODOLOGY
ExtractionClassificationComparison & Inference
![Page 17: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/17.jpg)
COMPARISON & INFERENCE
Precision, recall, accuracy, F1 score, interrater agreement
Sentiment scores determine what factors are considerable
![Page 18: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/18.jpg)
Analysis of tweets towards colleges during Hurricane Harvey
![Page 19: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/19.jpg)
EXTRACTION
Tweets collected between August 27 and August 31
Tweets containing the twitter handle/id(s) of colleges in Louisiana, from users all over the world
Majority of tweets coming from UNO, Tulane, Loyola, UL at Lafayette, UL at Monroe & LSU
Storage of data stream provided by Louisiana Optical Network Initiative(LONI)
![Page 20: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/20.jpg)
CLASSIFICATION
Categorized into three categories:1. Hurricane Harvey2. School Experience3. Irrelevant 4105 tweets collected, 3656 tweets classified 1160 for Harvey, 2048 for school experiences and
448 for irrelevant Manually inspected and labelled
![Page 21: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/21.jpg)
![Page 22: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/22.jpg)
REPRESENTATIONAL TWEETS
Harvey: “Yo @oursoutheastern you want to cancel school please.”
School Experience: “RT @HipHopPrez: .@du1869 in top 20% for national liberal arts”
Irrelavant: “@lsu i just finished gossip girl and i think i'm in tears #bestshowever”
![Page 23: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/23.jpg)
COMPARISON OF CLASSIFIERS
Classifier Task Categories Precision Recall F1 score Accuracy
Naïve Bayes
1Relevant 0.905 0.798 0.847
0.749Irrelevant 0.215 0.397 0.279
2Harvey 0.688 0.803 0.741
0.797School 0.877 0.794 0.833
SVM (rbf)1
Relevant 0.891 0.987 0.9360.882
Irrelevant 0.587 0.136 0.221
2Harvey 0.497 0.9 0.641
0.635School 0.895 0.484 0.629
Maximum Entropy
1Relevant 0.767 0.38 0.509
0.355Irrelevant 0.038 0.174 0.062
2Harvey 0.359 0.916 0.517
0.38School 0.617 0.077 0.137
![Page 24: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/24.jpg)
SENTIMENTAL ANALYSIS (AS OF 10.05.2017):
MetricSentiment Score
Proportion of collegesAverage Range
Population>8000 0.130 -1,1 0.74
< 0.153 -1,1 0.26
Rank (in LA)Inside top 10 0.129 -1,1 0.80
Outside 0.165 -0.8,1 0.20
RegionWestern 0.107 -0.75,1 0.17
Eastern 0.142 -1,1 0.83
Followers on Twitter
>10000 0.109 -1,1 0.65
< 0.185 -1,1 0.35
![Page 25: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/25.jpg)
OTHER STATS:
1. UNO 9. Nicholls State U.2. LSU 10. Loyola U.3. UL at Lafayette 11. Tulane U.4. UL at Monroe5. Dillard U.6. Southeastern U. LA7. Southern U. A&M8. Louisiana College
Series 1: Number of tweets in thousandsSeries 2: Average sentiment score
![Page 26: Big Data Mining in Twitter using Python Clients](https://reader030.vdocuments.net/reader030/viewer/2022032610/62391fcbd3a96576d22bd817/html5/thumbnails/26.jpg)
SUMMARY
Tweets before a storm are usually negativeResponsiveness from school officials
important for positive feedbackEquitable features is important for Max.
Ent. ClassifiersGet an overall comparison of colleges