ugf2861 tierney sentimentanalysis
DESCRIPTION
docTRANSCRIPT
-
What Are They Thinking? With Oracle Application Express and
Oracle Data Miner
Brendan TierneyRoel Hartman
Agenda Who are we The Scenario Oracle Data Miner & DBA tasks APEX the Poor/Smart mans BI tool Live Demo
Currently: Lecturer DBA Data Mining Consultant BI & Data Architect Trainer
Working with Oracle products since 1992/1993 Oracle version 5 up to 11g Oracle Reports (RPT), ReportWriter I, RPT, Forms 2.3 Oracle Data Miner since 2005
Data Warehousing since 1997 Data Mining since 1998 Analytics since 1993
Brendan Tierney
Available in the!OOW Book
Store
Available in eBook & Print formats
Book Signing Wed @1pm 4
-
The Scenario
But ? Is there an Alternative?
+ =
The Scenario We have a number of products We know the opinions from some sectors Can we use Data Mining to predict opinions Can we build interactive dashboards in the DB
! Data Mining & Interactive Dashboards with APEX all in side the Database
Text Mining in Oracle Natural language processing
It deals with the actual text element. It transforms it into a format that the machine can use.
!! Artificial intelligence / Machine Learning
It uses the information given by the NLP and uses a lot of maths to determine whether something is negative or positive.
All done in Oracle Data Miner (using Oracle Text) Allows Data Analysts to do this Isolated from the underlying complexity
Oracle Text
Oracle Data Mining
How is it done with Oracle Text & Oracle Advanced Analytics
Product Review
Human Labelling Tokenization Stop Word Punctuation
Text Ready for DM
New Product Reviews
Machine Learning Algorithms
Evaluation Model
Sentiment Score
Visualisation / Presentation
Actionable Insights
-
Let us have a closer look at what Oracle Text does
Tokenization
Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens.
The list of tokens becomes input for further processing such as parsing or text mining
Tokens are separated by whitespace characters, such as a space or line break, or by punctuation characters.
Punctuation and whitespace may or may not be included in the resulting list of tokens.
Today 28 Sept we are at OUF Sunday.
Today 28 Sept we are at OUF Sunday .
Stop Words
For analyzing twitter we can include hash tags !e.g. #OOW14
Stop Words
Today 28 Sept we are at OUF Sunday .
For analyzing twitter we can include hash tags !e.g. #OOW14
-
Punctuations
Characters that are defined as punctuations are removed from a token before text indexing . , : ; @ ~ # { } [ ] + = - _ ( ) * & ^ % $ ! ` \ | / ?
Product Review Tokenization Stop Word Punctuation
Text Ready for DM
Human Labelling
Today 28 Sept OUF Sunday .
Oralytics
Oracle Advanced Analytics Option
Advanced Analytics Option Technique Algorithms ApplicabilityClassification Logistic Regression (GLM)
Decision Trees Nave Bayes Support Vector Machine
Classical Statistical Technique Popular / Rules / Transparency Embedded Wide / Narrow Data / TextRegression ! Multiple Regression Support Vector Machine Classical Statistical Technique Wide / Narrow Data / Text
Anomaly Detection ! One Class SVM Lack ExamplesAttribute Importance ! Minimum Descriptive Length Attribute Reduction Identify Useful Data
Reduce Data Noise
Association Rules ! Apriori Market Basket Analysis Link AnalysisClustering Enhanced K-Means
O-Cluster Expectation Maximization
Product Grouping Text Mining Gene and Protein Analysis
Feature Extraction Non-Negative Matrix Factorization Principal Components Analysis Singular Vector Decomposition
Text Analysis Feature Reduction
-
Oracle'Data'Mining'! PL/SQL'Package'
! DBMS_DATA_MINING'! DBMS_DATA_MINING_TRANSFORM'! DBMS_PREDICTIVE_ANALYTICS'
! SQL'FuncBons' PREDICTION' PREDICTION_PROBABILITY'
PREDICTION_BOUNDS' PREDICTION_COST' PREDICTION_DETAILS' PREDICTION_SET' CLUSTER_ID' CLUSTER_DETAILS' CLUSTER_DISTANCE' CLUSTER_PROBABILITY' CLUSTER_SET' FEATURE_ID' FEATURE_DETAILS' FEATURE_SET' FEATURE_VALUE'
! 12c''PredicBve'Queries'! aka''Dynamic'Queries'! TransiBve'dynamic'Data'Mining'models'! Can'scale'to'many'100+'models'all'in'one'
statement''
Sta$s$cal(Func$ons(in(Oracle(
All(of(these(are(
FREE((with(the(Database(
These(are(o:en(forgo
-
The models are first class objects in the DB Just like calling any other function They are fast
Built a model on 550,000 records in 2 minutes Scored 1.2M records in 52 seconds (on a mid spec development sever)
>80M records per hour without using the Parallel Option
APEX - POOR MANS BI TOOL
22
23
+ any JavaScript charting engine you like
-
And then there is Interactive Reports
DEMO
- Create a visualisation of your model - Dashboard - Use your model for workflow decisions
26
APEX -
27
POORSMARTMANS BI TOOL
+ =
All inside the DatabaseBrendan TierneyRoel Hartman
[email protected] @brendantierney
[email protected] @roelh