advances in bi

Upload: amoljaju

Post on 01-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Advances in BI

    1/55

    1 Dr. Lakshmi Mohan

    Advances in BI

    1. Why Data Mining?

    2. Expert Systems: A Tool for Sifting Through Mountains of Data- ase Example: !"ean Spray ran#erries

    $. Data Mining Mo%els:- Asso"iation& Se'uential (atterns& lassifi"ation& lustering an%(re%i"ti)e Mo%els

    *. Data Mining Te"hni'ues:

    - De"ision Trees& +ules ,n%u"tion& +egression eural et/or0s

    . Text Mining for nstru"ture% Data

    3. 4usiness A"ti)ity Monitoring: A (riority To%ay

  • 8/9/2019 Advances in BI

    2/55

    2 Dr. Lakshmi Mohan

    Why Data Mining ?

    Data volumes are TOO BIG for traditional DSS Query/ Reporting andOLAP tools

    Organi!ations "ave to get value from t"e "uge investments of timeand money made in #uilding data $are"ouses

    %&o$ t"at $e "ave gat"ered so mu'" data( $"at do $e do $it" it)*

    %T"e datasets are of little dire't value t"emselves +"at is of value is t"e,no$ledge t"at 'an #e inferred from t"e data and put to use*

  • 8/9/2019 Advances in BI

    3/55

    3 Dr. Lakshmi Mohan

    Discover the Diamonds in Your

    Data Warehouse

    %-a.imi!e your ROI on data $are"ousing data marts #y ena#ling your de'ision ma,ers to e.ploit your 'ustomer data for 'ompetitive advantage*

    %T"is $e#0ena#led( point0and0'li', approa'" lets you employ OLAP( neutral net$or,s( '"urn analysis( and many ot"er visuali!ations and analyti'alte'"ni1ues to improve 2 3ustomer retention Target ,ey prospe't Profile mar,et segments Dete't fraud Analy!e 'ustomer response( and mu'" more*

    Sour'e4 Ads of BI vendors

    +it"out BI( your D+ is55

    5 +ell( a $are"ouse full of data

  • 8/9/2019 Advances in BI

    4/55

  • 8/9/2019 Advances in BI

    5/55

    5 Dr. Lakshmi Mohan

    Why is Data Mining a Hot Toic

    Today?

    1. ,mplementation of E+(& +M SM systems ha)e resulte% in )ast stores ofoperational %ata.

    2. Emergen"e of glo#al "ompetition has put the pressure on "ompanies to #e 5%ata-%ri)en6 7 i.e.& ma0e informe% %e"isions #ase% on fa"ts an% not hun"hes.

    $. The spee% of "hange in the mar0etpla"e %eman%s that the pearls of a"tiona#leinformation ha)e to #e foun% faster in the o"ean of %ata& for "ompanies to #e onestep ahea% of "ompetition.

    *. The har%/are nee%e% to store an% pro"ess a 5ton of %ata6 /as prohi#iti)elyexpensi)e until re"ently 7 58ou /oul% ha)e ha% to ha)e ASA at your %isposal6.To%ay& the te"hnology ma0es it feasi#le to apply "omplex mo%els to ferret outpatterns pre)iously left to rot in 5%ata 9ails6.

  • 8/9/2019 Advances in BI

    6/55

    6 Dr. Lakshmi Mohan

    The !ayoff from Data Mining

    " T#o E$am%es

    9 8armer:s Insuran'e Based on traditional data analysis( drivers of sports 'ars $ere determined

    to #e at "ig"er ris, for 'ollisions t"an drivers of %safe* 'ars su'" as ;olvos 6en'e '"arged t"em more for 'ar insuran'e Data mining dis'overed a pattern t"at '"anged t"e pri'ing poli'y5

    5 As long as t"e sports 'ar $as not t"e only 'ar in t"e "ouse"old( t"edriver fit t"e profile of t"e %safe* family 'ar driver( not t"e ris,y sports 'ardriver

    In t"e past( su''ess of promotional offers su'" as

  • 8/9/2019 Advances in BI

    7/557 Dr. Lakshmi Mohan

    What are E$ert &ystems?

    A te'"nology t"at ena#les e.pertise to #e distri#uted t"roug"out a firm$it"out t"e presen'e of t"e "uman e.pert

    Rule0Based System If %T"is*( T"en %T"at*

    Rules are determined from e.pert ,no$ledge and programmed in t"esoft$are

    An 6R Appli'ation

    S'reening a large num#er of resumes for relatively lo$0level positions $it"

    $ell0defined and pre'ise s,ill re1uirements- eg( 3all 3enter Agents

    ?.pert System 'an $eed out appli'ants $"o do not meet t"e re1uirements

  • 8/9/2019 Advances in BI

    8/558 Dr. Lakshmi Mohan

    A%ying E$ert &ystems '

    To E$tract (e#s from &canner Data

    The Promise: Better Data for Tracking Market SharesCompared to Retail Store Audits

    Frequenc: !eekl "s# Bimonthl$e"el of Detail: %PCs "s# BrandsScope: Top &' Markets "s# Regions

    The Pro(lem: Too Much DataAt least )'' times more data

    The Result: *mpossi(le to %se the +ualit Data

  • 8/9/2019 Advances in BI

    9/559 Dr. Lakshmi Mohan

    )*over&tory)" An E$ert &ystem+

    ,e%aced the Human Ana%yst

    Before . . . Companies circulated top-line reports, including tables and

    charts from the retail store audit data. An analystprepared the cover memo highlighting important news in

    the data.

    Now. . . Not feasible to have an army of analysts to sift through the

    mountain of scanner data. Instead, "CoverStory"

    automaticallywrites this memo! a model,im(edded e-pert sstem e-tracts the ne.s

    includes a (uilt,in thesaurus to eliminate repetitious.ording

  • 8/9/2019 Advances in BI

    10/5510 Dr. Lakshmi Mohan

    *ase E$am%e+

    -cean &ray

    *ran.erriesA /) (illion gro.er,o.ned agricultural cooperati"e$ean *S staff

    0nl one marketing professional for anal1ing the

    tracking data

    Scanner data for 2uices is imposing

    ,, 3'' M num(ers co"ering up to )'' data

    measures4 )'4''' products4 )5& .eeks and &'

    geographic markets

    ,, 6ro.s ( )' million ne. num(ers e"er four

    .eeks

  • 8/9/2019 Advances in BI

    11/5511 Dr. Lakshmi Mohan

    Imact of

    *over&tory

    7na(les a department of one to alert all 0ceanSpra marketing and sales managers to kepro(lems and opportunities and pro"ide pro(lem,sol"ing information

    Being done across 3 (usiness units handling scoresof compan products in do1ens of marketsrepresenting hundreds of millions of dollars of sales

    Sstem is totall integrated into (usiness operations

    (ecause it delivers information of competitive valuein running the (usiness

  • 8/9/2019 Advances in BI

    12/5512 Dr. Lakshmi Mohan

    Too%s to /et 0a%ue from Data Warehouses

    Business *ntelligence ToolsTo ena(le users .ithout programming skills to anal1e

    the ra. data in the data .arehouse#

    Ad 8oc +uer 9 Reporting

    0$AP Tools to slice; and dice; data#

    Data Mining ToolsAutomate the detection of patterns in the data .arehouse

    Build models to predict (eha"ior through statistical andmachine,learning techniques#

  • 8/9/2019 Advances in BI

    13/5513 Dr. Lakshmi Mohan

    Data Mining (ot 1imited to

    Discovery2

    < i#e#4 finding an e-isting nugget of gold; in themountain; of data4

    Data Mining used for Prediction alsoTelling ou not 2ust .here the gold is toda;4 (ut

    .here the gold might (e tomorro.;

    Predict .hat is going to happen ne-t (ased on .hat .eha"e found#

    From the moment I signed up for my Tot! "e#rds $rd in the $sino!o%%y nd fi!!ed in my nme& ddress& dte of %irth nd dri'er(s!i$ense num%er& )rrh(s hd pretty good hun$h tht my !ong termpotenti! #s !redy !o#* I #s 32+ yer o!d mn from the distntstte of ,ontn* did not fit the profi!e of high+ '!ue $ustomer-.

    /ge& gender nd distn$e from the $sino #ere identified through dtmining s $riti$! predi$tors of freuen$y of 'isiting $sinos

  • 8/9/2019 Advances in BI

    14/5514 Dr. Lakshmi Mohan

    3no#%edge Discovery in

    Data.ases

    " &tes in 3DD rocessData Warehouse

    Target Data

    (re - pro"esse% Data

    Transforme% Data

    (atterns

    ;no/le%ge

    Sele'tion

    3leaning

    Data redu'tion

    DATA -I&I&G

    ?valuation Interpretation

    Source: Communications of the ACM, 1996

  • 8/9/2019 Advances in BI

    15/5515 Dr. Lakshmi Mohan

    Data Mining is -ne &te in the 3DD

    !rocess

    Determine patterns from o(ser"ed data to sol"e a (usiness pro(lem#

    Step ): *dentif the Business Pro(lem

    +e#g#4 !ho are good; customers=

    !hich customers are likel to lea"e=

    Step 5: Choose Model or 6oal for Data Mining

    +Some models are (etter for predictions .hile others are (etter fordescri(ing (eha"ior

    Step >: Choose Technolog to Build Model

    Step 3: Appl the Algorithm ?Computation process@ to Data# Re"ie. the resultsand refine the Model

    Step &: alidate the Model on e. Data ?the hold,out; dataset@

  • 8/9/2019 Advances in BI

    16/5516 Dr. Lakshmi Mohan

    Data Mining

    Mode%s

    )# Association+ *f customer (us spaghetti4 also (us red .ine in ' of cases

    5# Sequential Patterns E time or e"ent (ased+ A customer orders ne. sheets and pillo. cases follo.ed ( drapes in

    & of the cases

    ># Classification+ 0pera ticket (uers are usuall oung ur(an professionals .ith high

    income .hile countr music concert ticket purchasers are tpicall(lue collar .orkers

    3# Clustering+ Disco"ers different groups in the data .hose mem(ers are "ersimilar

    Predicti"e Models+ Relate (eha"ior of customers ?dependent; "aria(le@ to predictors

    ?independent; "aria(les felt to (e responsi(le; for the dependentone@

  • 8/9/2019 Advances in BI

    17/5517 Dr. Lakshmi Mohan

    Association Mode%s for

    Mar4et'Based Ana%ysis Model finds items that occur together in a gi"en e"ent or

    record

    Disco"ers rules of the form:

    If item / is prt of n e'ent& then of the time $onfiden$ef$tor& Item is prt of the e'ent

    %sed to disco"er patterns of items (ought together from themountain; of scanner data

    7-ample:

    If $ustomer %uys $orn $hips& then 65 of the time& !so %uys$o!

    n!ess there is promotion& in #hi$h $se %uys $o! 85 of thetime

  • 8/9/2019 Advances in BI

    18/5518 Dr. Lakshmi Mohan

    &e5uentia%

    !atterns

    Similar to Association Models4 e-cept that the relationships

    among items are spread o"er time#

    euen$es re sso$itions in #hi$h e'ents re !ined %y time

    Require data on the identit of the transactors in addition

    to details of each transaction#

    7-ample:

    If surgi$! pro$edure is performed& then 45 of the time

    infe$tion : o$$urs #ithin 5 dys

    ut fter 5 dys& the !ie!ihood of infe$tion : drops to 4

  • 8/9/2019 Advances in BI

    19/5519 Dr. Lakshmi Mohan

    *%assification Mode%s

    " Most *ommon Data Mining

    Mode% Descri(e the group that a mem(er (elongs to (

    e-amining e-isting cases that alread ha"e (een

    classified4 and inferring a set of rules

    These *F,T87 rules are often depicted in a tree likestructure

    7-amples:

    + ;ht re the $hr$teristi$s of $ustomers #ho re !ie!y to s#it$h to

    ri'! te!e$om ser'i$e pro'iderJ4 etc#

    *t determines .hich categorical predictor is furthest from independence;

    .ith the prediction "alues of churners and non,churners#

    Pro(lem: Continuous "aria(les such as age ha"e to (e coerced into a

    categorical form E ho. man categories= .here should the splits (e=

  • 8/9/2019 Advances in BI

    26/55

    26 Dr. Lakshmi Mohan

    Decision Tree for &egmenting *ustomers

    " Who ,esonded to a Mar4eting

    *amaignOverall : 7% of Customers Responded

    Segment of 3ustomers +"o Rent $it" 6ig" 8amily In'omeand &o Savings A/' 4 @ response

    Target t"is segment for 8uture Dire't -ar,eting 3ampaign

  • 8/9/2019 Advances in BI

    27/55

    27 Dr. Lakshmi Mohan

    Data Mining Techni5ues" ,u%e Induction

    -ost 'ommon form of ,no$ledge dis'overy in unsupervised learning systems Rule 2 %I8 t"is and t"is and t"is( T6?& t"at*

    0A''ura'y or 3onfiden'e4 6o$ often is t"is rule 'orre't)0 3overage4 6o$ many re'ords does t"is rule apply toRig"t Side of Rule =after T6?&> 2 3onse1uent =Only O&? 3ondition>

    C

  • 8/9/2019 Advances in BI

    28/55

    28 Dr. Lakshmi Mohan

    ,u%e *overage vs Accuracy

    3overage 6ig"

    A''ura'y Lo$

    Rule is rarely 'orre't(B7T 'an #e used often

    A''ura'y 6ig"

    Rule is often 'orre'tA&D 'an #e used often

    3overage Lo$ Rule is rarely 'orre't

    A&D 'an only rarely #e used

    Rule is often 'orre't

    B7T 'an only rarely #e used

    Total of #as,ets in data#ase H 9

    $it" eggs H

    $it" mil, H @ $it" #ot" eggs and mil, H

  • 8/9/2019 Advances in BI

    29/55

    29 Dr. Lakshmi Mohan

    What To Do With A ,u%e?

    9 Target t"e Ante'edent4- All rules /ith a "ertain )alue for the ante"e%ent& e.g.& 5nails& #olts an% s"re/s6& arepresente% to a retailer- Woul% %is"ontinuing the sale of these lo/-margin items ha)e any effe"t on sales ofhigher margin pro%u"ts& e.g.& expensi)e hammers?0 ?.ample4A Britis" supermar,et $as a#out to dis'ontinue a line of e.pensive 8ren'"

    3"eeses $"i'" $ere not selling $ellBut data mining s"o$ed t"at t"e fe$ people $"o $ere #uying t"e '"eeses $ereamong t"e supermar,et:s most profita#le 'ustomers 2 so it $as $ort" ,eepingt"e '"eese to retain t"em

  • 8/9/2019 Advances in BI

    30/55

    30 Dr. Lakshmi Mohan

    ,u%e Induction vs7 Decision Trees

    De'ision Trees4 One A&D O&LK One Rule for a Re'ord

    - All re"or%s in training %ata set /ill #e mutually ex"lusi)e non-o)erlapping@ segments

    - Super)ise% learning /here the out"ome is 0no/n for ea"h re"or% in the training %ata

    set. e.g.& Was the person a goo% ris0 or a #a% ris0?

    0 Pro'ess trains t"e algorit"m to re'ogni!e ,ey varia#les and values t"at $ill #e

    used for predi'tions $it" ne$ data

    Rule Indu'tion4 -ay #e -any Rules for a Re'ord

    - ot guarantee% that a rule /ill exist for e)ery possi#le re"or% in the training %ata set

    - Will not partition the %ata into mutually ex"lusi)e segments

    a parti"ular re"or% may mat"h any num#er of rules& in"lu%ing no rules at all

    - More "ommonly use% for 0no/le%ge %is"o)ery in unsuper)ise% learning than

    pre%i"tion

    0 Rules are generally 'reated #y ta,ing a simple "ig"0level rule( and t"en adding

    ne$ 'onstraints to it until t"e 'overage gets so small t"at it is not meaningful

  • 8/9/2019 Advances in BI

    31/55

    31 Dr. Lakshmi Mohan

    When to 8se What?

    De'ision Trees40 3reate t"e smallest possi#le set of rules for a predi'tive model- /or0 from a pre%i"tion target %o/n/ar% in /hat is 0no/n as 5gree%y6 sear"h 7loo0 for the #est possi#le split on the next step& gree%ily pi"0ing the #est one /ithoutloo0ing any further than the next step- ,f there is o)erlap #et/een t/o pre%i"tors& the #etter of the t/o /oul% #e pi"0e%.e.g.& height might #e use% instea% of shoe-si>e as a pre%i"tor /hereas #oth "oul%#e use% as ante"e%ents in a rule in%u"tion system0 Traditionally used for e.ploration to determine t"e useful predi'tors to #efed on t"e se'ond pass of data mining into predi'tion models using statisti'alte'"ni1ues or neural net$or,s

    Rule Indu'tion40 Kields a variety of rules $it" different predi'tors even if some are redundant- E)en though height an% shoe si>e are highly "orrelate%& #oth "oul% #e preset asante"e%ents in t/o %ifferent rules 7 in "ontrast& the %e"ision tree /oul% pi"0 the#etter of the t/o pre%i"tors

    0 -ainly used to dis'over interesting patterns in t"e data

  • 8/9/2019 Advances in BI

    32/55

    32 Dr. Lakshmi Mohan

    Data Mining Techni5ues

    " ,egression Mode%s

    Statisti"al mo%els /hi"h lin0 pre%i"tors or 5in%epen%ent6 )aria#les to the)aria#le to #e pre%i"te% or 5%epen%ent6 )aria#le

    ser has to sele"t the pre%i"tors an% %efine the stru"ture of the lin0age

    e.g.& a linear mo%el lin0ing the pre%i"tor& ustomerBs Annual ,n"ome 8@

    to the )aria#le to #e pre%i"te%& A)erage ustomer 4an0 4alan"e& C@8 a #FCThe "onstants& GaB an% G#B in the a#o)e mo%el& are "alle% 5parameters6that spe"ify the shape of the line relating C an% 8.

    The parameters are "al"ulate% so as to minimi>e the sum of s'uares ofthe fore"ast errors /hen the mo%el is applie% to the training or mo%el-fitting %ata set of C )alues an% "orrespon%ing a"tual 8 )alues The 5least s'uares metho%6 uses "al"ulus to %eri)e the formulas forthe parameters aan% #.

  • 8/9/2019 Advances in BI

    33/55

    33 Dr. Lakshmi Mohan

    0a%idation and ,efinement of ,egression Mode%s

    %R0S1uared* value is 'al'ulated to s"o$ t"e goodness of fit of t"epredi'ted K values from t"e model to t"e a'tual K values in t"e data sete.g.& a )alue of =.HI means than HIJ of the )ariation in y /as explaine% #y the mo%el

    A'id test of t"e model is to apply t"e fitted model to ne$ data not used to

    'al'ulate t"e parameters =a: and #:> of t"e model 2 t"e %"old0out* or%validation* data set

    Refine t"e model( if ne'essary( to ma,e #etter predi'tions4 A%% multiple pre%i"tors 5multiple regression mo%els@

    Transform pre%i"tors #y s'uaring& ta0ing logarithms et" 5non-linear mo%els6@

    om#ine pre%i"tors #y multiplying or ta0ing rationse.g.& ratio of annual househol% in"ome to family si>e@

    If dependent varia#le is a response varia#le $it" Must Kes/&o or /9 values(

    a different model 'alled %logisiti' regression* model is used

    D Mi i T h i

  • 8/9/2019 Advances in BI

    34/55

    34 Dr. Lakshmi Mohan

    Data Mining Techni5ues

    " (eura% (et#or4s

    Based on t"e 'on'ept of t"e "uman #rain in t"at it learns- originally %e)elope% for military appli"ations to tell /hether a spe"0 on

    a s"reen is a #om#er or a #ir%& an% %is"riminate #et/een %e"oys an%

    genuine mista0es

    - no/& the same te"hnology "an separate goo% "ustomers from #a%ones

    &et$or, 'omposed of a large num#er of %neurons* =or pro'essing

    elements> tied toget"er $it" $eig"ted 'onne'tions =synapses>

    - A "olle"tion of "onne"te% notes& ea"h ha)ing an input an% an output&an% arrange% in layers.

    - 4et/een the )isi#le ,nput Kayer an% final !utput Kayer& there "oul% #e

    a num#er of hi%%en pro"essing layers

  • 8/9/2019 Advances in BI

    35/55

    35 Dr. Lakshmi Mohan

    Structure of a Neural

    Netork

    A neural net/or0 uses a training %ata set to pro%u"e outputs from

    inputs& /hi"h are then "ompare% /ith the 0no/n output. A "orre"tionis then "al"ulate% for the %is"repan"y in the output an% applie% to thepro"essing in the no%es in the net/or0

    The pro"ess is repeate% until its stopping "on%ition su"h as%e)iations #eing less than a pres"ri#e% amount is rea"he%

    ! Simple " ample

  • 8/9/2019 Advances in BI

    36/55

    ! Simple "#ample

    vs A'tual value of

    &o Default

    @J=J> N E=9> H F

    L Lin, $eig"ts =J 9 in t"e a#ove e.ample> are adMusted to 'orre't for t"edeviation #et$een t"e output of t"e pro'essing =F in t"is 'ase> and t"ea'tual value = in t"is 'ase>

    L Large errors are given greater attention in t"e 'orre'tion t"an small errors

  • 8/9/2019 Advances in BI

    37/55

    37 Dr. Lakshmi Mohan

    Ho# do (eura% (et#or4s 1earn?

    ompute

    !utput

    DesiredOutput

    A'"ieved)

    A%9ust

    Weights

    &o

    Stop

    Kes

  • 8/9/2019 Advances in BI

    38/55

    38 Dr. Lakshmi Mohan

    !ros and *ons of (eura% (ets

    Pros Data0driven 7sed $"en e.pertise is "ard to 'odify( #ut good results are ,no$n +or,s $ell $"en t"e te'"ni1ue is 'ustomi!ed for a $ell0defined pro#lem

    su'" as4

    0 3redit 3ards 8raud Dete'tion =6&3 Soft$are:s 8al'on System>0 Dire't -ar,eting 3ampaigning =ASA:s -odel-A> After t"e te'"ni1ue "as proven to #e su''essful( it 'an #e used over and

    over again $it"out a deep understanding of "o$ it $or,s

    3ons4 6ard to interpret $eig"ts and neuron relations"ips &ot easy to use4

    0 All t"e predi'tors must "ave numeri' values0 Output is also numeri' and needs to #e translated if t"e final outputvaria#le is 'ategori'al su'" as t"e pur'"ase of #lue or $"ite or #la', Means

  • 8/9/2019 Advances in BI

    39/55

    39 Dr. Lakshmi Mohan

    Ho# to Eva%uate a Data Mining !roduct

    1. What 0in% of #usiness pro#lem %oes it a%%ress?

    2. What te"hni'ue %oes it use to mo%el the %ata?

    $.

  • 8/9/2019 Advances in BI

    40/55

    New Generation of Text Mining

  • 8/9/2019 Advances in BI

    41/55

    41 Dr. Lakshmi Mohan

    New Generation of Text MiningTools

    5to e.tra't ,ey elements from large unstru'tured datasets( dis'over relations"ips and summari!e t"einformation

    3ategori!ation4Presents t"e sear'" results in 'ategories( rat"er t"an anundifferentiated mass

    3lustering4

    Grouping similar do'uments #ased on t"eir 'ontent

    ?.tra'tion4?.tra'ting relevant information from a do'umenteg( pulling out all t"e 'ompany names from a data set

    New Generation of Text Mining

  • 8/9/2019 Advances in BI

    42/55

    42 Dr. Lakshmi Mohan

    New Generation of Text MiningTools

    ey$ord Sear'"4Sear'"ing do'uments for t"e o''urren'e of a parti'ular $ord or set

    of $ords

    &atural0Language pro'essing4Determining t"e meaning of $ritten $ords ta,ing into a''ount t"eir

    'onte.t( grammar( et'

    ;isuali!ation4

    Grap"i'ally presenting t"e mined data as relations"ips are easier

    to spot and understand

    Case Example of Text Mining

  • 8/9/2019 Advances in BI

    43/55

    43 Dr. Lakshmi Mohan

    Case Example of Text Mining- Dow Cemi!al"s #I Center

    7sing 3learResear'" soft$are to e.tra't data from a 'entury:s $ort"of '"emi'al patent a#stra'ts( pu#lis"ed resear'" papers and t"e'ompany:s o$n files

    %By eliminating t"e irrelevant( $e:ve #een a#le to redu'e t"e time itta,es for resear'"ers to find $"at t"ey need to read*

    3learResear'" uses a proprietary pattern0mat'"ing te'"nology tosear'" for information( 'ategori!e it and s"o$ its relations"ip to ot"er

    data

    %T"e soft$are 'an see( dis'over and e.tra't 'on'epts( not Must $ordsIt gives us a pi'torial representation of t"e te.t in t"e do'ument in an

    easy0to0understand '"art*

    Case Example of Text Mining

  • 8/9/2019 Advances in BI

    44/55

    44 Dr. Lakshmi Mohan

    Case Example of Text Mining- Air $rod%!ts & Cemi!al"s 'nowledge Management

    (ystem

    3ompany "as over 9C( employees in 'ountries( and more t"an E intranet ande.tranet sites

    Its file servers 'ontain FTB of unstru'tured data( e.'luding email or anyt"ing stored onlo'al drives

    7sing SmartDis'overy to generate a 'atalog and inde. of t"e data repository so t"at it'an #e more easily a''essed #y -S S"arePoint Portal Do'ument -anagement System

    Also using t"e soft$are for Sar#anes0O.ley 'omplian'e and e0learning sin'e #y'orre'tly 'ategori!ing t"e data( #usiness rules 'an #e applied to a 'ategory ofdo'uments rat"er t"an to individual do'uments4

    eg( if a do'ument relates to operations 'overed #y SO( t"en t"e appropriate data0retention poli'ies are applied to it

    %I 'all it t"e 'entral nervous system for $"at $e are doing $it",no$ledge management*

  • 8/9/2019 Advances in BI

    45/55

    45 Dr. Lakshmi Mohan

    Text Mining Tools

    3ome eit"er as stand0alone produ'ts or em#edded as part of a larger soft$aresystem4

    Data#ase vendors4 Ora'le( IB-(5- In'orporating pattern0mat'"ing algorit"ms into t"eir data#ase produ'ts

    Data -ining vendors4 SAS( SPSS(5- Added te.t mining to t"eir portfolios

    ?nterprise Sear'" ?ngine ;endors4 Autonomy( ;erily(5

    Spe'iali!ed Te.t -ining 8irms4 In.ig"t Soft$are( Stratify5

    %Installing SAS Te.t -iner is a simple pro'ess0 Must needed to load E 3Ds on my$or,station* 6ard part44 Get meaningful results0 Depends on t"e s,ill and ,no$ledge of user to properly interrogate te.t repositories%+e are getting an in'reasing understanding of $"at t"ings are possi#le $it" te.tmining But t"ere is a "uge s,ills pro#lem in t"is area( $"i'" is $"y it "asn:t gottenmu'" tra'tion so far*0 Gartner

  • 8/9/2019 Advances in BI

    46/55

    46 Dr. Lakshmi Mohan

    De! )**+ ,eport of Gartner

    Te.t -ining "as not #een $ell 'oupled $it" 'learly re'ogni!ed %pain points* in t"e

    organisation 3ustomer servi'e "as #een mainly "andled in 'all 'enters( $it" an emp"asison transa'tion pro'essing and s"ort intera'tion times As a result( most firms "ave #eenmissing valua#le input from 'ustomers on "o$ to improve t"eir #usiness pro'esses T"is"as led to lo$ levels of 'ustomer satisfa'tion( little long0term loyalty and an e.pensive(al#eit ne'essary( $ay of resolving 'ustomer 'omplaints5

    Blended servi'e delivery models using te.t mining( telep"one and $e# servi'es $ill ena#le'ompanies to identify not only $"at t"e 'ustomer said( #ut also $"at $as meant5 $ill #ea#le to spot and resolve pro#lems earlier5 improve t"eir a#ility to prevent pro#lemsre'urring5improved measurement of 'ustomer satisfa'tion over today:s fla$ed surveymet"odology*

    Te.t -ining +ill revolutioni!e 3R- Strategies #y

  • 8/9/2019 Advances in BI

    47/55

    47 Dr. Lakshmi Mohan

    %s ness ! v y on or ng#AM.

    Automated monitoring of #usiness0related a'tivity affe'ting an enterprise

    Report on a'tivity in t"e 'urrent operational 'y'le( eg( t"e 'urrent "our( day or $ee,

    Designed to spot pro#lems early enoug" to "ead t"em off

    BA- is not a ne$ 'on'ept

    3redit 3ard 'ompanies "ave "ad real0time fraud monitors for years

    -anufa'turers "ave real0time error0dete'tion soft$are #uilt into t"eir assem#ly lines

    Proa'tive or Rea'tive)

    %T"e 'onventional $isdom "as #een to Must ta,e transa'tional data and move it to t"e data

    $are"ouse and t"en to t"e BI System But t"ese systems aren:t responsive*

    -onitoring #usiness a'tivity after t"e fa't is too late to "ead off a pro#lem su'" as a missed

    deadline or t"e loss of a maMor 'ustomer

    BA- systems plu', t"e data in real time from t"e appli'ations $"ere it originates 0

    order entry( a''ounts re'eiva#le( 'all 'enters( et' Output in variety of forms 2

    das"#oards( e0mails( pager alerts(5

  • 8/9/2019 Advances in BI

    48/55

    48 Dr. Lakshmi Mohan

    GE"s ,eal-Time Das/oard

    G?:s aim is to monitor everyt"ing in real time( G?:s 3IO e.plains( 'alling up aspe'ial $e# page on "is P34 a %digital das"#oard*8rom a distan'e it loo,sli,e a -ondrian 'anvas in green( yello$ and red A 'loser loo, reveals t"at t"e'olors signal t"e status of soft$are appli'ations 'riti'al to G?:s #usiness If oneof t"e programs stays red or even yello$ for too long( "e gets t"e system to e0

    mail t"e people in '"arge 6e 'an also see $"en "e "ad to intervene t"e lasttime( or "o$ individual appli'ations su'" as programs to manage #oo,0,eepingor orders "ave performed

    As 3IO( -r Reiner $as t"e first in t"e firm to get a das"#oard( in early

  • 8/9/2019 Advances in BI

    49/55

    49 Dr. Lakshmi Mohan

    #AM Case Example- Davis Controls 0td1 Canada.

    ?very afternoon( at @4 pm( a s'reen pops up on t"e 3?O:s P3 $it"important %ne$s*4 6o$ many orders t"e 'ompany #oo,ed

    &ames of 'ustomers $"o "ave gone past F days $it"out paying

    Orders t"at "ave missed delivery promises

    PL7S 9 Daily ?0mail Alerts( eg( +"i'" salespeople "ave not logged in t"at day to do$nload t"e latest data from a

    'orporate data#ase a#out t"e 'ustomers in t"eir territories %Sometimes t"ose

    remote sales guys $ill Must sit out t"ere in never0never land( and as long as t"ey

    t"in, no one is $at'"ing( t"ey $ill mar'" to t"eir o$n drummer* +"en a promised order0delivery is missed( one e0mail alert is generated for t"e

    responsi#le salesperson( one goes to a 'ustomer $it" an apology( and one goes

    to an e.pediter5 Different e0mails go to ne$ 'ustomers( depending on t"e si!e of

    t"eir initial orders

    #AM Case Example

  • 8/9/2019 Advances in BI

    50/55

    50 Dr. Lakshmi Mohan

    #AM Case Example- Davis Controls 0td1 Canada.

    7se -a'ola ?nterprise Suite( an ?RP pa',age from ?.a'tSoft$are( a su#sidiary of a Dut'" 3ompany In'ludes t"e ?.a't ?vent -anager( a BA- produ't t"at triggers

    alerts and reports on a'tivity and non0a'tivity( #ot" inside and

    outside t"e ?RP system

    %BA- ena#les me to manage t"e 3ompany more

    proa'tively Before( I:d "ave to $ait until a 'ustomer 'alled

    $it" a 'omplaint or t"e mont"0end finan'ial reports toreally get a feel for "o$ t"e #usiness $as doing*

    - A 2ort%ne 3** 2inan!ial (ervi!es

  • 8/9/2019 Advances in BI

    51/55

    51 Dr. Lakshmi Mohan

    - A 2ort%ne 3** 2inan!ial (ervi!es2irm

    7ses SeeRun Platform( a suite of produ'ts from SeeRun 3orp inSan 8ran'is'o To monitor some ( 'ases per year $"ere t"e firm "as signed 'ontra'ts

    $it" it:s 'lients guaranteeing performan'e against operational metri'srelating to do!ens of milestones in t"e 'ontra'ts

    %If a tas, is supposed to #e 'ompleted $it"in

  • 8/9/2019 Advances in BI

    52/55

    52 Dr. Lakshmi Mohan

    - Te Al/%4%er4%e CityGovernment

    7ses &oti'e3ast from 3ognos To proa'tively pus" e0mail noti'es of important events( in near real time( to 'ity

    employees( residents vendors &oti'e3ast sits outside t"e 'ity:s fire$all on an e.tranet and monitors events #y

    periodi'ally 1uerying Ora'le ta#les populated #y muni'ipal systems

    ;endors

    Sends an e0mail to ea'" vendor t"at $as issued an ele'troni' payment during t"e nig"t Dire'ts t"e vendor to a +e#site on t"e e.tranet $"ere it 'an get a remittan'e report

    Residents Sends an e0mail to ea'" residents for $"om a $ater0#ill $as produ'ed $it" all t"e

    pertinent #illing info Dire'ts t"e resident to a +e#site $"ere "e may pay "is #ill online

    3ity ?mployees On'e0a0day e0mails to 'ertain employees letting t"em ,no$ of all online payments made

    to t"e 'ity during t"e past

  • 8/9/2019 Advances in BI

    53/55

    53 Dr. Lakshmi Mohan

    5at"s Next for #AM6

    +ill #e'ome tig"tly 'oupled to Business Pro'ess-anagement =BP-> systems Send Alerts in a pu#lis"/su#s'ri#e model to lots of BP- systems

    t"roug"out t"e enterprise

    ?vents go in and alerts 'ome out( #ut t"ose alerts Must #e'omeevents in ot"er appli'ations

    ?.ample4 A BA- system 'ould generate an alert t"at t"e estimated date of

    a pa',age delivery "ad slipped A 3R- system and a BP- system mig"t ea'" su#s'ri#e to su'"

    %pa',age due0date '"ange* alerts( e.tending t"e usefulness of

    t"e alerts

  • 8/9/2019 Advances in BI

    54/55

    54 Dr. Lakshmi Mohan

    5at"s Next for #AM6

    -ore sop"isti'ated rules of logi' $ill #e in'luded in BA-'apa#le of finding "idden patterns in 'urrent #usiness

    a'tivity #y doing on0t"e0fly analyses of "istori'al data %If a pro'ess is #eginning to go Sout"( t"e early #irds of t"at are

    "ard to see ?ventually( $e:ll see BI BA- married at t"e level ofusing "istori'ally re'orded data to identify pro#lems mu'" earlier*

    ?ven furt"er out lies t"e 6oly Grail of BA-4 +"en a system not

    only sees a pro#lem 'oming #ut also goes #eyond alerts to

    a'tually fi.ing t"e pro#lem eg( automati'ally reordering a part $"en it sees t"at a s"ipment

    "as #een lost

    2 an e.ample of autonomi' response( a self0learning system

  • 8/9/2019 Advances in BI

    55/55

    An E$am%e of Autonomic ,esonse

    1= years ago: ,f you /ere a goo% "ustomer& Pe%Ex shippe% you a ( an% allo/e% you to%ial into their net/or0

    years ago: 8ou "oul% get the shipping information from any #ro/ser

    ustomers no/ /ant shipping information on their or%er status s"reen

    Tomorro/Qs S"enario:

    Fed=> p!ne $ontining your p$ge

    is sno#ed in ?in$innti

    Fed=> system no#s your p$ge

    #i!! not rri'e in the morning

    / ;e% ser'i$e $n send you er!y noti$e of

    non+de!i'ery through the ?", system

    usiness pro$ess for supp!y $hin !oos for n