ugf2861 tierney sentimentanalysis

Upload: jeed

Post on 09-Jan-2016

215 views

Category:

Documents


0 download

DESCRIPTION

doc

TRANSCRIPT

  • What Are They Thinking? With Oracle Application Express and

    Oracle Data Miner

    Brendan TierneyRoel Hartman

    Agenda Who are we The Scenario Oracle Data Miner & DBA tasks APEX the Poor/Smart mans BI tool Live Demo

    Currently: Lecturer DBA Data Mining Consultant BI & Data Architect Trainer

    Working with Oracle products since 1992/1993 Oracle version 5 up to 11g Oracle Reports (RPT), ReportWriter I, RPT, Forms 2.3 Oracle Data Miner since 2005

    Data Warehousing since 1997 Data Mining since 1998 Analytics since 1993

    Brendan Tierney

    Available in the!OOW Book

    Store

    Available in eBook & Print formats

    Book Signing Wed @1pm 4

  • The Scenario

    But ? Is there an Alternative?

    + =

    The Scenario We have a number of products We know the opinions from some sectors Can we use Data Mining to predict opinions Can we build interactive dashboards in the DB

    ! Data Mining & Interactive Dashboards with APEX all in side the Database

    Text Mining in Oracle Natural language processing

    It deals with the actual text element. It transforms it into a format that the machine can use.

    !! Artificial intelligence / Machine Learning

    It uses the information given by the NLP and uses a lot of maths to determine whether something is negative or positive.

    All done in Oracle Data Miner (using Oracle Text) Allows Data Analysts to do this Isolated from the underlying complexity

    Oracle Text

    Oracle Data Mining

    How is it done with Oracle Text & Oracle Advanced Analytics

    Product Review

    Human Labelling Tokenization Stop Word Punctuation

    Text Ready for DM

    New Product Reviews

    Machine Learning Algorithms

    Evaluation Model

    Sentiment Score

    Visualisation / Presentation

    Actionable Insights

  • Let us have a closer look at what Oracle Text does

    Tokenization

    Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens.

    The list of tokens becomes input for further processing such as parsing or text mining

    Tokens are separated by whitespace characters, such as a space or line break, or by punctuation characters.

    Punctuation and whitespace may or may not be included in the resulting list of tokens.

    Today 28 Sept we are at OUF Sunday.

    Today 28 Sept we are at OUF Sunday .

    Stop Words

    For analyzing twitter we can include hash tags !e.g. #OOW14

    Stop Words

    Today 28 Sept we are at OUF Sunday .

    For analyzing twitter we can include hash tags !e.g. #OOW14

  • Punctuations

    Characters that are defined as punctuations are removed from a token before text indexing . , : ; @ ~ # { } [ ] + = - _ ( ) * & ^ % $ ! ` \ | / ?

    Product Review Tokenization Stop Word Punctuation

    Text Ready for DM

    Human Labelling

    Today 28 Sept OUF Sunday .

    Oralytics

    Oracle Advanced Analytics Option

    Advanced Analytics Option Technique Algorithms ApplicabilityClassification Logistic Regression (GLM)

    Decision Trees Nave Bayes Support Vector Machine

    Classical Statistical Technique Popular / Rules / Transparency Embedded Wide / Narrow Data / TextRegression ! Multiple Regression Support Vector Machine Classical Statistical Technique Wide / Narrow Data / Text

    Anomaly Detection ! One Class SVM Lack ExamplesAttribute Importance ! Minimum Descriptive Length Attribute Reduction Identify Useful Data

    Reduce Data Noise

    Association Rules ! Apriori Market Basket Analysis Link AnalysisClustering Enhanced K-Means

    O-Cluster Expectation Maximization

    Product Grouping Text Mining Gene and Protein Analysis

    Feature Extraction Non-Negative Matrix Factorization Principal Components Analysis Singular Vector Decomposition

    Text Analysis Feature Reduction

  • Oracle'Data'Mining'! PL/SQL'Package'

    ! DBMS_DATA_MINING'! DBMS_DATA_MINING_TRANSFORM'! DBMS_PREDICTIVE_ANALYTICS'

    ! SQL'FuncBons' PREDICTION' PREDICTION_PROBABILITY'

    PREDICTION_BOUNDS' PREDICTION_COST' PREDICTION_DETAILS' PREDICTION_SET' CLUSTER_ID' CLUSTER_DETAILS' CLUSTER_DISTANCE' CLUSTER_PROBABILITY' CLUSTER_SET' FEATURE_ID' FEATURE_DETAILS' FEATURE_SET' FEATURE_VALUE'

    ! 12c''PredicBve'Queries'! aka''Dynamic'Queries'! TransiBve'dynamic'Data'Mining'models'! Can'scale'to'many'100+'models'all'in'one'

    statement''

    Sta$s$cal(Func$ons(in(Oracle(

    All(of(these(are(

    FREE((with(the(Database(

    These(are(o:en(forgo

  • The models are first class objects in the DB Just like calling any other function They are fast

    Built a model on 550,000 records in 2 minutes Scored 1.2M records in 52 seconds (on a mid spec development sever)

    >80M records per hour without using the Parallel Option

    APEX - POOR MANS BI TOOL

    22

    23

    + any JavaScript charting engine you like

  • And then there is Interactive Reports

    DEMO

    - Create a visualisation of your model - Dashboard - Use your model for workflow decisions

    26

    APEX -

    27

    POORSMARTMANS BI TOOL

    + =

    All inside the DatabaseBrendan TierneyRoel Hartman

    [email protected] @brendantierney

    [email protected] @roelh