4 - exploring data

Upload: bryan-wahyu

Post on 19-Feb-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/23/2019 4 - Exploring Data

    1/32

    BAB 3

    EXPLORING DATA

    CREATED BY : ARIF DJUNAIDY (FTIF - ITS)PRESENTED BY : I PUTU GEDE HENDRA SUPUTRA, S.KOM. M.KOM

    11 Mare !"1#

  • 7/23/2019 4 - Exploring Data

    2/32

    OUTLINE

    S$%%ar& Sa''

    *'$a+'a'

  • 7/23/2019 4 - Exploring Data

    3/32

    WHAT IS DATA EXPLORATION?

    Ke& %'/a' 0 aa e23+ra' '+$e

    He+3'4 e+e 5e r'45 + 0r 3re3re'4 raa+&'

    Ma6'4 $e 0 5$%a7 a8'+''e re4'e 3aer Pe3+e a re4'e 3aer a3$re 8& aa aa+&' +

    Re+ae 5e area 0 E23+rar& Daa Aa+&'

    (EDA) Creae 8& a'''a J5 T$6e&

    Se%'a+ 86 ' E23+rar& Daa Aa+&' 8& T$6e&

    A 'e +'e 'r$' a 8e 0$ ' C5a3er 1 0 5eNIST E4'eer'4 Sa'' Ha86

    53:99.'+.'.4/9'/;

  • 7/23/2019 4 - Exploring Data

    4/32

    TECHNIQUES USED IN DATA

    EXPLORATION

    I EDA, a r'4'a++& e=e 8& T$6e&

    T5e 0$ a /'$a+'a' C+$er'4 a a%a+& ee' ere /'ee a

    e23+rar& e5'>$e

    I aa %''4, +$er'4 a a%a+& ee' are%a?r area 0 'ere, a 5$45 0 a ?$e23+rar& not discussed further in this chapter

    I $r '$' 0 aa e23+ra', e 0$ S$%%ar& a''

    *'$a+'a'

    O+'e Aa+&'a+ Pre'4 (O@AP) next week

  • 7/23/2019 4 - Exploring Data

    5/32

    IRIS SAMPLE DATA SET

    Ma& 0 5e e23+rar& aa e5'>$e are'++$rae '5 5e Ir' P+a aa e. Ca 8e 8a'e 0r% 5e UCI Ma5'e @ear'4

    Re3'r&

    53:99.'.$'.e$9%+ear9M@Re3'r&.5%+ Fr% 5e a'''a D$4+a F'5er

    T5ree er &3e (+ae):

    Sea

    *'r4''a

    *er'+$r

    F$r (-+a) ar'8$e Se3a+ '5 a +e45

    Pea+ '5 a +e45

    Virginica. Robert H. Mohlenbrock.

    USDA NRCS. 1995. Northeast wetland

    lora! "ield oice g#ide to $lant

    s$ecies. Northeast National %echnical

    Center& Chester& 'A. Co#rtes( o USDA

    NRCS )etland Science *nstit#te.

    http://www.ics.uci.edu/~mlearn/MLRepository.htmlhttp://www.ics.uci.edu/~mlearn/MLRepository.html
  • 7/23/2019 4 - Exploring Data

    6/32

    SUMMARY STATISTICS

    S$%%ar& a'' are $%8er 5a$%%ar'e 3r3er'e 0 5e aaS$%%ar'e 3r3er'e '+$e 0re>$e&,

    +a' a 3rea

    E2a%3+e: +a' - %ea 3rea - aar

    e/'a'M $%%ar& a'' a 8e a+$+ae ' a

    '4+e 3a 5r$45 5e aa

  • 7/23/2019 4 - Exploring Data

    7/32

    FREQUENCY AND MODE

    T5e frequency0 a ar'8$e /a+$e ' 5e3erea4e 0 '%e 5e /a+$e $r '

    5eaa e Fr e2a%3+e, 4'/e 5e ar'8$e 4eer7 a a re3reea'/e

    33$+a' 0 3e3+e, 5e 4eer 0e%a+e7 $r a8$ #" 0 5e '%e.

    T5e mode0 a a ar'8$e ' 5e % 0re>$e ar'8$e/a+$e

    T5e ' 0 0re>$e& a %e are &3'a++& $e '5ae4r'a+ aa

  • 7/23/2019 4 - Exploring Data

    8/32

    PERCENTILES

    Fr '$$ aa, 5e ' 0 a3ere'+e ' %re $e0$+.

    G'/e a r'a+ r '$$ ar'8$exa a $%8erp8eee " a 1"", 5ep5

    3ere'+e ' a /a+$exp0x$5 5ap 0

    5e 8er/e /a+$e 0xare +e 5axpFr 'ae, 5e #"53ere'+e ' 5e /a+$e

    x50%$5 5a #" 0 a++ /a+$e 0xare +e

    5ax50%

  • 7/23/2019 4 - Exploring Data

    9/32

    MEASURES OF LOCATION MEAN

    AND MEDIAN

    T5e mean' 5e % %% %ea$re 0 5e+a' 0 a e 0 3'.

    He/er, 5e %ea ' /er& e''/e $+'er.

    T5$, 5e medianr mean' a+ %%+& $e.

  • 7/23/2019 4 - Exploring Data

    10/32

    MEASURES OF SPREAD RANGE AND

    !ARIANCE

    Ra4e ' 5e 'eree 8eee 5e %a2 a %'

    T5e /ar'ae r aar e/'a' ' 5e %%% %ea$re 0 5e 3rea 0 a e 0 3'.

    He/er, 5' ' a+ e''/e $+'er, 5a5er %ea$re are 0e $e:Absolute Average

    Deviation(AAD), Median Absolute Deviation(MAD),a 'er>$ar'+e ra4e

  • 7/23/2019 4 - Exploring Data

    11/32

    !ISUALI"ATION

    *'$a+'a' ' 5e /er' 0 aa ' a /'$a+r a8$+ar 0r%a 5a 5e 5araer'' 0 5eaa a 5e re+a'5'3 a%4 aa 'e% rar'8$e a 8e aa+&e r re3re.

    *'$a+'a' 0 aa ' e 0 5e % 3er0$+a a33ea+'4 e5'>$e 0r aa e23+ra'. H$%a 5a/e a e++ e/e+3e a8'+'& aa+&e +ar4e

    a%$ 0 '0r%a' 5a ' 3reee /'$a++&

    Ca ee 4eera+ 3aer a re

    Ca ee $+'er a $$$a+ 3aer

  • 7/23/2019 4 - Exploring Data

    12/32

    EXAMPLE SEA SURFACE TEMPERATURE

    T5e 0++'4 5 5e Sea S$r0ae Te%3era$re (SST) 0rJ$+& 1

  • 7/23/2019 4 - Exploring Data

    13/32

    REPRESENTATION

    Re3reea' ' 5e %a33'4 0 '0r%a' a/'$a+ 0r%a

    Daa 8?e, 5e'r ar'8$e, a 5e re+a'5'3a%4 aa 8?e are ra+ae ' 4ra35'a+e+e%e $5 a 3', +'e, 5a3e, a +r.

    E2a%3+e: O8?e are 0e re3reee a 3'

    T5e'r ar'8$e /a+$e a 8e re3reee a 5e 3''

    0 5e 3' r 5e 5araer'' 0 5e 3', e.4.,+r, 'e, a 5a3e

    I0 3'' ' $e, 5e 5e re+a'5'3 0 3', '.e.,5e5er 5e& 0r% 4r$3 r a 3' ' a $+'er, ' ea'+&3ere'/e.

  • 7/23/2019 4 - Exploring Data

    14/32

    ARRANGEMENT

    Arra4e%e ' 5e 3+ae%e 0 /'$a+ e+e%e '5' a'3+a&

    Ca %a6e a +ar4e 'eree ' 5 ea& ' ' $era5e aa

    E2a%3+e:

  • 7/23/2019 4 - Exploring Data

    15/32

    SELECTION

    Se+e' ' 5e e+'%'a' r 5e e-e%35a' 0 era'8?e a ar'8$e

    Se+e' %a& '/+/e 5e 5'4 a $8e 0 ar'8$e D'%e'a+'& re$' ' 0e $e re$e 5e $%8er 0

    '%e' r 5ree

    A+era'/e+&, 3a'r 0 ar'8$e a 8e 'ere

    Se+e' %a& a+ '/+/e 5'4 a $8e 0 8?e

    A re4' 0 5e ree a +& 5 %a& 3' Ca a%3+e, 8$ a 3reer/e 3' ' 3are area

  • 7/23/2019 4 - Exploring Data

    16/32

    !ISUALI"ATION TECHNIQUES HISTOGRAMS

    H'4ra%

    U$a++& 5 5e 'r'8$' 0 /a+$e 0 a '4+e /ar'a8+e

    D'/'e 5e /a+$e ' 8' a 5 a 8ar 3+ 0 5e $%8er 0

    8?e ' ea5 8'.T5e 5e'45 0 ea5 8ar ''ae 5e $%8er 0 8?e

    S5a3e 0 5'4ra% e3e 5e $%8er 0 8'

    E2a%3+e: Pea+ '5 (1" a !" 8', re3e'/e+&)

  • 7/23/2019 4 - Exploring Data

    17/32

    TWO#DIMENSIONAL HISTOGRAMS

    S5 5e ?' 'r'8$' 0 5e /a+$e 0 ar'8$e

    E2a%3+e: 3ea+ '5 a 3ea+ +e45

  • 7/23/2019 4 - Exploring Data

    18/32

    !ISUALI"ATION TECHNIQUES BOX PLOTS

    B2 P+ I/ee 8& J. T$6e&

    A5er a& 0 '3+a&'4 5e 'r'8$' 0 aa

    F++'4 =4$re 5 5e 8a' 3ar 0 a 82 3+

    outlier

    10thpercentile

    25thpercentile

    75thpercentile

    50thpercentile

    90thpercentile

  • 7/23/2019 4 - Exploring Data

    19/32

    EXAMPLE OF BOX PLOTS

    B2 3+ a 8e $e %3are ar'8$e

  • 7/23/2019 4 - Exploring Data

    20/32

    !ISUALI"ATION TECHNIQUES

    SCATTER PLOTS

    Saer 3+ Ar'8$e /a+$e eer%'e 5e 3''

    T-'%e'a+ aer 3+ % %%, 8$ a 5a/e

    5ree-'%e'a+ aer 3+ O0e a''a+ ar'8$e a 8e '3+a&e 8& $'4 5e

    'e, 5a3e, a +r 0 5e %ar6er 5a re3ree 5e8?e

    I ' $e0$+ 5a/e arra& 0 aer 3+ a %3a+&

    $%%ar'e 5e re+a'5'3 0 e/era+ 3a'r 0 ar'8$e See e2a%3+e 5e e2 +'e

  • 7/23/2019 4 - Exploring Data

    21/32

    SCATTER PLOT ARRAY OF IRIS

    ATTRIBUTES

  • 7/23/2019 4 - Exploring Data

    22/32

    !ISUALI"ATION TECHNIQUES

    CONTOUR PLOTSC$r 3+

    Ue0$+ 5e a '$$ ar'8$e ' %ea$re a 3a'a+ 4r'

    T5e& 3ar'' 5e 3+ae ' re4' 0 '%'+ar/a+$e

    T5e $r +'e 5a 0r% 5e 8$ar'e 05ee re4' e 3' '5 e>$a+ /a+$e

    T5e % %% e2a%3+e ' $r %a3 0e+e/a'Ca a+ '3+a& e%3era$re, ra'0a++, a'r

    3re$re, e. A e2a%3+e 0r Sea S$r0ae Te%3era$re (SST) '

    3r/'e 5e e2 +'e

  • 7/23/2019 4 - Exploring Data

    23/32

    CONTOUR PLOT EXAMPLE SST DEC$ %&&'

    Celsius

  • 7/23/2019 4 - Exploring Data

    24/32

    !ISUALI"ATION TECHNIQUES

    MATRIX PLOTS

    Mar'2 3+ Ca 3+ 5e aa %ar'2

    T5' a 8e $e0$+ 5e 8?e are re ar'4

    +a T&3'a++&, 5e ar'8$e are r%a+'e 3re/e e

    ar'8$e 0r% %'a'4 5e 3+

    P+ 0 '%'+ar'& r 'ae %ar'e a a+ 8e $e0$+0r /'$a+''4 5e re+a'5'3 8eee 8?e

    E2a%3+e 0 %ar'2 3+ are 3reee 5e e2 +'e

  • 7/23/2019 4 - Exploring Data

    25/32

    !ISUALI"ATION OF THE IRIS

    DATA MATRIX

    standard

    deviation

  • 7/23/2019 4 - Exploring Data

    26/32

    !ISUALI"ATION OF THE IRIS

    CORRELATION MATRIX

  • 7/23/2019 4 - Exploring Data

    27/32

    !ISUALI"ATION TECHNIQUES

    PARALLEL COORDINATES

    Para++e+ Cr'ae Ue 3+ 5e ar'8$e /a+$e 0 5'45-'%e'a+ aa

    Iea 0 $'4 3er3e'$+ar a2e, $e a e 0 3ara++e+a2e

    T5e ar'8$e /a+$e 0 ea5 8?e are 3+e a a 3' ea5 rre3'4 r'ae a2' a 5e 3' areee 8& a +'e

    T5$, ea5 8?e ' re3reee a a +'e

    O0e, 5e +'e re3ree'4 a '' +a 0 8?e

    4r$3 4e5er, a +ea 0r %e ar'8$e Orer'4 0 ar'8$e ' '%3ra ' ee'4 $5

    4r$3'4

  • 7/23/2019 4 - Exploring Data

    28/32

    PARALLEL COORDINATES PLOTS FOR

    IRIS DATA

  • 7/23/2019 4 - Exploring Data

    29/32

    OTHER !ISUALI"ATION TECHNIQUES

    Sar P+ S'%'+ar a33ra5 3ara++e+ r'ae, 8$ a2e ra'ae

    0r% a era+ 3'

    T5e +'e e'4 5e /a+$e 0 a 8?e ' a 3+&4

    C5er Fae A33ra5 reae 8& Her%a C5er

    T5' a33ra5 a'ae ea5 ar'8$e '5 a

    5araer'' 0 a 0ae T5e /a+$e 0 ea5 ar'8$e eer%'e 5e a33earae 0

    5e rre3'4 0a'a+ 5araer''

    Ea5 8?e 8e%e a e3arae 0ae

    Re+'e 5$%a7 a8'+'& ''4$'5 0ae

  • 7/23/2019 4 - Exploring Data

    30/32

    STAR PLOTS FOR IRIS DATA

    Sea

    *er'+$r

    *'r4''a

  • 7/23/2019 4 - Exploring Data

    31/32

    CHERNOFF FACES FOR IRIS DATA

    Sea

    *er'+$

    *'r4''a

  • 7/23/2019 4 - Exploring Data

    32/32

    THE END

    THANK YOU