tweets classification

19
1 Tweets Classification Supervisor - Dr. Vikas Saxena Name - Shubhangi Agarwal Varun Ajay Gupta

Upload: varun-gupta

Post on 07-Nov-2014

116 views

Category:

Education


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Tweets Classification

1

Tweets Classification

Supervisor - Dr. Vikas SaxenaName - Shubhangi Agarwal Varun Ajay GuptaEnrolment No. – 10104768 10104730

Page 2: Tweets Classification

04/08/2023Footer Text 2

Introduction• As we are living in an era of social networking

that’s why our project focuses on twitter. In this project we extracts the tweets and then classify them into different categories . As with extraction of tweets we extracts the huge amount of information with it.

• By using tweet classification we can predict the current trend like which is most popular language on twitter, most talked about person , burning topics and much more.

Page 3: Tweets Classification

04/08/2023Footer Text 3

Problem Statement

• Extraction of tweets.• Converting unstructured data into structured

data.• Pre-processing of data .• Finding the most popular language on twitter.• Choosing of features for the classification.• Classifying the tweets into different categories.

Page 4: Tweets Classification

04/08/2023Footer Text 4

Algorithm • SVMs (support vector machines) are supervised

learning  models with associated learning algorithms  that analyse data and recognize patterns, used for  classification and regression analysis .

• Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other,

Page 5: Tweets Classification

04/08/2023Footer Text 5

Why SVM ?

• Most popular in text classification.• High accuracy in comparison to other algorithms.• By choosing right features svm can be robust

even when the training sample has some bias.

Page 6: Tweets Classification

04/08/2023Footer Text 6

Technology Used

• Operating System: UBUNTU 12.04 .• Language: PYTHON• Tools: GEDIT• Debugger: PYTHON DEBUGGER

Page 7: Tweets Classification

04/08/2023Footer Text 7

Page 8: Tweets Classification

04/08/2023Footer Text 8

Unstructured Tweets

Page 9: Tweets Classification

04/08/2023Footer Text 9

Structured Tweets

Page 10: Tweets Classification

04/08/2023Footer Text 10

Calculating most popular language on

twitter

Page 11: Tweets Classification

04/08/2023Footer Text 11

Pictorially showing popularity of

languages

Page 12: Tweets Classification

04/08/2023Footer Text 12

Features choose• No of sports words.• No of politics words.• No of entertainment words.• Lexical complexity.• No of hash tags.• No of digits.

Page 13: Tweets Classification

04/08/2023Footer Text 13

Values of features of training set

Page 14: Tweets Classification

04/08/2023Footer Text 14

Feature values of testing data set before

application of SVM

Page 15: Tweets Classification

04/08/2023Footer Text 15

Result of classification of tweets

Page 16: Tweets Classification

04/08/2023Footer Text 16

Graph of SVM and accuracy

Page 17: Tweets Classification

04/08/2023Footer Text 17

ConclusionOn implementing the SVM on the testing dataset . It classifies the data into sports ,entertainment and politics category with a accuracy of 97.5%

Page 18: Tweets Classification

04/08/2023Footer Text 18

Future Work • Till now we have implemented the SVM to classify

the tweets in general categories like Sports , politics , entertainment. We will try to implement it to categories data into more specific categories so that it can be used by the marketing and PR team of different organizations while they are choosing their strategies.

Page 19: Tweets Classification

04/08/2023 19

Thank You