bigml fall 2015 release

Download BigML Fall 2015 Release

Post on 13-Feb-2017

591 views

Category:

Data & Analytics

1 download

Embed Size (px)

TRANSCRIPT

  • Introducing Association Discovery

    BigML 2015 Fall Release

  • BigMLInc Fall2015Release 2

    TodaysWebinar Speaker: PoulPetersen,CIO

    Moderator: AtakanCe>nsoy,VPPredic>veApplica>ons

    Enterques>onsintochatboxwellanswersomeviatext;othersattheendofthesession

    email:info@bigml.com TwiPer:@bigmlcom

    mailto:info@bigml.com

  • BigMLInc Fall2015Release 3

    Associa1onDiscovery

    AlgorithmMagnumOpusfromGeoffWebb

    UnsupervisedLearning:unlabelleddata

    LearningTask:Findinteres1ngrela1onsbetweenvariables.

  • BigMLInc Fall2015Release

    DecisionTreesBaggingDecisionForest

    4

    BigMLWorkflow

    MODEL

    DATASET

    CLUSTER

    ANOMALY

    ASSOCIATION

    SOURCE

    K-MeansG-Means

    Isola>onForest

    MagnumOpus

  • BigMLInc Fall2015Release 5

    date customer account auth class zip amountMon Bob 3421 pin clothes 46140 135Tue Bob 3421 sign food 46140 401Tue Alice 2456 pin food 12222 234Wed Sally 6788 pin gas 26339 94Wed Bob 3421 pin tech 21350 2459Wed Bob 3421 pin gas 46140 83The Sally 6788 sign food 26339 51

    Clustering

    date customer account auth class zip amountMon Bob 3421 pin clothes 46140 135Tue Bob 3421 sign food 46140 401Tue Alice 2456 pin food 12222 234Wed Sally 6788 pin gas 26339 94Wed Bob 3421 pin tech 21350 2459Wed Bob 3421 pin gas 46140 83The Sally 6788 sign food 26339 51

    AnomalyDetec1on

    similar

    unusual

    UnsupervisedLearning

  • BigMLInc Fall2015Release

    date customer account auth class zip amountMon Bob 3421 pin clothes 46140 135Tue Bob 3421 sign food 46140 401Tue Alice 2456 pin food 12222 234Wed Sally 6788 pin gas 26339 94Wed Bob 3421 pin tech 21350 2459Wed Bob 3421 pin gas 46140 83The Sally 6788 sign food 26339 51

    6

    {customer = Bob, account = 3421} zip = 46140

    Rules:

    {class = gas} amount > 80

    Associa1onRules

  • BigMLInc Fall2015Release

    date customer account auth class zip amountMon Bob 3421 pin clothes 46140 135Tue Bob 3421 sign food 46140 401Tue Alice 2456 pin food 12222 234Wed Sally 6788 pin gas 26339 94Wed Bob 3421 pin tech 21350 2459Wed Bob 3421 pin gas 46140 83The Sally 6788 sign food 26339 51

    7

    {customer = Bob, account = 3421} zip = 46140

    Rules:

    {class = gas} amount > 80

    Antecedent Consequent

    Associa1onRules

  • BigMLInc Fall2015Release 8

    UseCases

    MarketBasketAnalysis WebusagepaPerns Intrusiondetec>on Frauddetec>on Bioinforma>cs Medicalriskfactors

  • BigMLInc Fall2015Release 9

    MarketBasketAnalysis

    Datasetof9,834grocerycarttransac>ons Eachrowisalistofallitemsinacartatcheckout

    GOAL:Discoverinteres1ngrulesaboutwhatstoreitemsaretypicallypurchasedtogether.

  • BigMLInc Fall2015Release 10

    Associa1onMetrics

    Instances

    AC

    Coverage

    PercentageofinstanceswhichmatchantecedentA

  • BigMLInc Fall2015Release 11

    Associa1onMetrics

    Instances

    AC

    Support

    PercentageofinstanceswhichmatchantecedentAandConsequentC

  • BigMLInc Fall2015Release

    Confidence

    Percentageofinstancesintheantecedentwhichalsocontaintheconsequent.

    SupportCoverage

    12

    Associa1onMetrics

    Instances

    AC

  • BigMLInc Fall2015Release

    CInstances

    A C

    A

    Instances

    C

    Instances

    A

    13

    Associa1onMetrics

    Instances

    AC

    0% 100%

    Instances

    AC

    Confidence

    AneverimpliesC

    Asome1mesimpliesC

    AalwaysimpliesC

  • BigMLInc Fall2015Release

    LiO

    Ra>oofobservedsupporttosupportifAandCweresta>s>callyindependent.

    Support==Confidencep(A)*p(C)p(C)

    14

    Associa1onMetrics

    Independent

    AC

    C

    Observed

    A

  • BigMLInc Fall2015Release

    C

    Observed

    A

    15

    Associa1onMetrics

    Observed

    AC

    < 1 > 1

    Independent

    A C

    Lift = 1

    Nega>veCorrela>on NoAssocia>on

    Posi>veCorrela>on

    Independent

    A C

    Independent

    A C

    Observed

    A C

  • BigMLInc Fall2015Release 16

    Associa1onMetrics

    Independent

    AC

    C

    Observed

    A

    Leverage

    DifferenceofobservedsupportandsupportifAandCweresta>s>callyindependent.

    Support-[p(A)*p(C)]

  • BigMLInc Fall2015Release

    C

    Observed

    A

    17

    Associa1onMetrics

    Observed

    AC

    < 0 > 0

    Independent

    A C

    Leverage = 0

    Nega>veCorrela>on NoAssocia>on

    Posi>veCorrela>on

    Independent

    A C

    Independent

    A C

    Observed

    A C

    -1 1

  • BigMLInc Fall2015Release 18

    GOAL:Findgeneralrulesthatindicatediabetes.

    Datasetofdiagnos>cmeasurementsof768pa>ents.

    Eachpa>entlabelledTrue/Falsefordiabetes.

    MedicalRisk

  • BigMLInc Fall2015Release 19

    MedicalRiskAssocia1onRule

    If plasma glucose > 146 then diabetes = TRUE

    DecisionTree

    If plasma glucose > 155 and bmi > 29.32 and diabetes pedigree > 0.32 and insulin

  • BigMLInc Fall2015Release 20

    Par1alDependencePlots

    VisualizeEnsembles

  • BigMLInc Fall2015Release 21

    FlatlineEditor

    hPps://github.com/bigmlcom/flatline

    https://github.com/bigmlcom/flatline

  • BigMLInc Fall2015Release

    DecisionTreesBaggingDecisionForest

    22

    BigMLWorkflow

    MODEL

    DATASET

    CLUSTER

    ANOMALY

    ASSOCIATION

    SOURCE

    K-MeansG-Means

    Isola>onForest

    MagnumOpus

    DATASET

    FlatlineFlatlineEditor

  • BigMLInc Fall2015Release 23

    Logis1cRegression

    DATASET LOGISTIC REGRESSION

    Classifica>onalgorithm Categorical:one-hotencoded Text:mappedtotokenfreq Bindingssupportlocalmodel I1/I2regulariza>on CurrentlyAPIonly

    hPps://bigml.com/developers/logis>cregressions

    https://bigml.com/developers/logisticregressions

  • BigMLInc Fall2015Release

    DecisionTreesBaggingDecisionForestLogis>cRegression

    24

    BigMLWorkflow

    MODEL

    DATASET

    CLUSTER

    ANOMALY

    ASSOCIATION

    SOURCE

    K-MeansG-Means

    Isola>onForest

    MagnumOpus

    DATASET

    FlatlineFlatlineEditor

  • BigMLInc Fall2015Release 25

    BigMLClassifiers

    Advantages Disadvantages

    SingleTree easytointerpretrobusttomissingdata overfiong

    Ensemble topperformerrobusttomissingdata hardtointerpret

    Logis1cRegression robusttonoiseoutputsprobability

    nomissingdatahardtointerpret

  • BigMLInc Fall2015Release

    DecisionTreesBaggingDecisionForestLogis>cRegression

    26

    BigMLWorkflow

    MODEL

    DATASET

    CLUSTER

    ANOMALY

    ASSOCIATION

    SOURCE

    K-MeansG-Means

    Isola>onForest

    MagnumOpus

    Sta>s>calTestsCorrela>ons

    STATSDATASET

    FlatlineFlatlineEditor

  • BigMLInc Fall2015Release 27

    Correla1ons

    DATASET CORRELATION

    PearsonCoefficient SpearmanCoefficient Chi-Square Cramr'sV Tschuprow'sT One-wayANOVA

    hPps://bigml.com/developers/correla>ons

    https://bigml.com/developers/correlations

  • BigMLInc Fall2015Release 28

    Sta1s1calTests

    DATASET STATISTICAL TESTS

    BenfordsLaw Anderson-Darling Jarque-Bera Z-score Grubbs

    hPps://bigml.com/developers/sta>s>caltests

    https://bigml.com/developers/statisticaltests

  • BigMLInc Fall2015Release

    DecisionTreesBaggingDecisionForestLogis>cRegression

    29

    BigMLWorkflow

    MODEL

    DATASET

    CLUSTER

    ANOMALY

    ASSOCIATION

    SOURCE

    K-MeansG-Means

    Isola>onForest

    MagnumOpus

    Sta>s>calTestsCorrela>ons

    STATSDATASET

    FlatlineFlatlineEditor

  • BigMLInc Fall2015Release 30

    Q&A

    Askques1onsandgetaFreeBigMLT-shirt!

    Alldemonstratedfeaturesareimmediatelyavailabletoallusersincluding:Allsubscrip1onplansVirtualPrivateCloud(VPC)customersOn-premiseimplementa1ons.

    Documenta1on@hRps://bigml.com/releases

    https://bigml.com/releases

  • BigMLInc Fall2015Release 31

    FEEDBACK

    @bigmlcom TWITTER

    info@bigml.com

    GetStartedToday!

    RESOURCES Join us for future webinars & hangouts

    OFFICE HOURS

    Every Wednesday 9:30am Pacific Time

    mailto:info@bigml.com