text analytics
TRANSCRIPT
![Page 1: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/1.jpg)
Presented by: Ajay Ram K P
![Page 2: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/2.jpg)
2
What is Text analytics?? Text analytics is the process of
analyzing unstructured text, extracting relevant information and transforming it into useful business intelligence.
Text analytics processes can be performed manually, but the amount of text-based data available to companies today makes it increasingly important to use intelligent, automated solutions.
![Page 3: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/3.jpg)
3
Why is Text Analytics important??
Emails, online reviews, tweets, call center agent notes, and the vast array of other written feedback, all hold insight into customer wants and needs only if you can unlock it.
Text analytics is the way to extract meaning from this unstructured text, and to uncover patterns and themes.
![Page 4: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/4.jpg)
4
Text Analytics in R
Text Analytics in R is carried out with the help of tm package.
It is a framework for text mining applications within R.
Contains functions for actions such as content transformation, word removal, finding frequent terms and lot more
![Page 5: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/5.jpg)
5
The Case Study data The data used is a collection of game reviews in an
Excel sheet.
Game reviews from 1000 gamers are recorded in the data set.
The objective is to do an analysis of these reviews treating all of them as one text and find out the most frequent words.
![Page 6: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/6.jpg)
6
Part 1
The review are read to a variable docs using functions VectorSource(), Corpus().
VectorSource() sets a source for comparison. Corpus() creates a skeleton of the text.
Reading the Data
review.txt
![Page 7: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/7.jpg)
7
Data cleansing is required as most of the reviews are contain punctuations, numbers, stop words etc. that we don’t require for analysis.
Depending out what you are trying to achieve with your analysis, you may want to do the data cleaning step differently.
Data cleansing is done using tm_map() function in R
Cleaning the Data
![Page 8: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/8.jpg)
8
Converting document into Document Term Matrix A document-term matrix or term-document matrix is a mathematical matrix that
describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms.
The tm package stores document term matrixes as sparse matrices for efficacy. Since we only have 1000 reviews and one document we can just convert our term-document-matrix into a normal matrix, which is easier to work with.
Code: dtm <- TermDocumentMatrix(docs) m <- as.matrix(dtm)
We then take the column sums of this matrix, which will give us a named vector.
And now we can sort this vector to see the most frequently used words.Code: v <- sort(rowSums(m),decreasing=TRUE) head(v)
Finding the frequent terms and their frequency
![Page 9: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/9.jpg)
9
![Page 10: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/10.jpg)
10
For plotting the Word Cloud, we use wordcloud package.
Plotting the Word Cloud
![Page 11: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/11.jpg)
11
And Voila!!!
![Page 12: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/12.jpg)
12
Part 2Creating the Network For network creation, we take help of packages
igraph sna network
![Page 13: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/13.jpg)
13
Finding the association. findAssocs() function is used.
Creating the Network
![Page 14: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/14.jpg)
14
Plotting the graph. Using igraph package & graph.data.frame() function
Creating the Network
![Page 15: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/15.jpg)
15
And there it is!!!
![Page 16: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/16.jpg)
16
Another Graph… Graph where frequent terms are node and number
of frequencies are interaction/strength.
![Page 17: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/17.jpg)
17
In case of large networks Say the network has more than 10K nodes. Such networks will be
complicated. For quantifying such networks we go for statistical aspects of the
network. Use of Random network, Scale-free network or Hierarchical network
models in such cases would be fit.
Random Network Scale-free Network
Hierarchical Network
![Page 18: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/18.jpg)
18
Where else can network approaches be powerful?? Biological Science
Economics
Computer science
![Page 19: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/19.jpg)
![Page 20: Text Analytics](https://reader037.vdocuments.net/reader037/viewer/2022103010/58a2a6ae1a28ab0d0a8b62bd/html5/thumbnails/20.jpg)
20
THANK YOU!!!