my first data science project (data science thailand meetup #1)

20
Data Science Thailand Meetup #1 (My first Data Science Project) 16 October 2015 By Komes Chandavimol [email protected] http://datascienceth.com/category/seminar/slides/

Upload: data-science-thailand

Post on 13-Apr-2017

1.783 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: My First Data Science Project (Data Science Thailand Meetup #1)

Data  Science  Thailand  Meetup  #1  (My  first  Data  Science  Project)  

16  October  2015  By  Komes  Chandavimol  

[email protected]  

http://datascienceth.com/category/seminar/slides/  

Page 2: My First Data Science Project (Data Science Thailand Meetup #1)

Café  Amazon    *  Need  to  understand  more  about  customer  opinion,  behaviors  on  the  brand  of  Café  Amazon  

The  plan  *  The  first  steps  is  to  collect  the  sets  of  social  media  information  starting  from  public  datasets  such  as  twitters.  This  could  the  good  steps  for  future  Big  Data  platform.  

Café  Amazon  Use  Case  

http://datascienceth.com/category/seminar/slides/  

Page 3: My First Data Science Project (Data Science Thailand Meetup #1)

*  Source  –  Twitters  that  relates  to  “Café  Amazon”,  “Amazon  Coffee”,  or  “Amazon”  

Source  -­‐  Twitter  

Page 4: My First Data Science Project (Data Science Thailand Meetup #1)

Twitter  Data  Discovery  

Page 5: My First Data Science Project (Data Science Thailand Meetup #1)

*  Step  1:  Identify  Class  (Label)  *  Step  2:  Data  Cleaning  *  Step  3:  Word  Counting  *  Step  4:  Tokenization  *  Step  5:  Model  Building  *  Step  6:  Using  Bayes  and  the  MAP  model  

Tweet  Classification  using  Excel  

Page 6: My First Data Science Project (Data Science Thailand Meetup #1)

Step  1:  Identify  Class  (Label)  

Amazon  (AMZ)   Others  (OTH)  

Page 7: My First Data Science Project (Data Science Thailand Meetup #1)

Step  1:  Identify  Class  (Label)  

 Identify  Class  (Label)  

Page 8: My First Data Science Project (Data Science Thailand Meetup #1)

*  Change  all  to  lower  cases  *  =lower(A2)  

*  Removing    space  ,  .  ?  !  ;  and  ,    *  =SUBSTITUTE(B2,".  ","  "),=SUBSTITUTE(C2,":  ","  ")  *  =SUBSTITUTE(D2,"?","  ")=SUBSTITUTE(E2,"!","  ")  *  =SUBSTITUTE(F2,";","  ")=SUBSTITUTE(G2,",","  ")  

Step  2:  Clean  Data  

Page 9: My First Data Science Project (Data Science Thailand Meetup #1)

Step  2:  Clean  Data  

Tweet   Classification  (Label)  

lowercase   remove  .   remove  :   remove  ?  

My  J  soya  iced  latte.  I  think  Amazon  Cafe  makes  the  best   AMZ   =LOWER(B2)   =SUBSTITUTE(C2,".  ","  ")  

=SUBSTITUTE(D2,":  ","  ")  

=SUBSTITUTE(E2,"?","  ")  

I  am  at  Amazon  Café   AMZ   =LOWER(B2)   =SUBSTITUTE(C2,".  ","  ")  

=SUBSTITUTE(D2,":  ","  ")  

=SUBSTITUTE(E2,"?","  ")  

green  tea  time  ❤🍵🍃  @  #cafeAmazon  pic.twitter.com/28jzc4ojOy  

AMZ   =LOWER(B2)   =SUBSTITUTE(C2,".  ","  ")  

=SUBSTITUTE(D2,":  ","  ")  

=SUBSTITUTE(E2,"?","  ")  

#Coffee  Instagram  by  @Tikkieinlove  #Tuesday#relax  time#icecoffee#espresso#cafeamazon#iphone6plus#  

AMZ   =LOWER(B2)   =SUBSTITUTE(C2,".  ","  ")  

=SUBSTITUTE(D2,":  ","  ")  

=SUBSTITUTE(E2,"?","  ")  

wBestSellers:  #>  Amazon  #Deals:  Save  $62  (48%  OFF)  Mr.  Coffee  BVMC-­‐EL1  Café  

OTH   =LOWER(B2)   =SUBSTITUTE(C2,".  ","  ")  

=SUBSTITUTE(D2,":  ","  ")  

=SUBSTITUTE(E2,"?","  ")  

#Amazon  #Coffee  Store©☕goo.gl/b8U4a1☕#BeanCoffee  #Coffeemaker  #grinder  #cappuccino  #hillsbros  

OTH   =LOWER(B2)   =SUBSTITUTE(C2,".  ","  ")  

=SUBSTITUTE(D2,":  ","  ")  

=SUBSTITUTE(E2,"?","  ")  

#>  Amazon  #Deals:  Save  $62  (48%  OFF)  Mr.  Coffee  BVMC-­‐EL1  Caf...  bit.ly/1GHYQfG  |    

OTH   =LOWER(B2)   =SUBSTITUTE(C2,".  ","  ")  

=SUBSTITUTE(D2,":  ","  ")  

=SUBSTITUTE(E2,"?","  ")  

Page 10: My First Data Science Project (Data Science Thailand Meetup #1)

Step  3:  Word  Counting  *  Prepare  Words  from    

First  N  row,  set  space  position  =  0   =LEN(C2)  

Page 11: My First Data Science Project (Data Science Thailand Meetup #1)

Step  3:  Word  Counting  

3.1  Separate  Words  

After  N  Rows  =IFERROR(MID(A2,B2+1,B102-­‐B2-­‐1),".")   =LEN(C2)  

Page 12: My First Data Science Project (Data Science Thailand Meetup #1)

Step  3:  Word  Counting  

3.2  Check  the  result  

=IFERROR(MID(A2,B2+1,B102-­‐B2-­‐1),".")   =LEN(C2)  =IFERROR(FIND("  ",A127,B27+1),LEN(A127)+1)  

Page 13: My First Data Science Project (Data Science Thailand Meetup #1)

Step  3:  Word  Counting  

3.2  Check  the  result  

=IFERROR(MID(A2,B2+1,B102-­‐B2-­‐1),".")  =LEN(C2)  =IFERROR(FIND("  ",A127,B27+1),LEN(A127)+1)  

Page 14: My First Data Science Project (Data Science Thailand Meetup #1)

Step  3:  Word  Counting  

3.3  Using  Pivot  Table  and  count  each  word  

=C4/C$3  =LN(D4)  =B4+1  

Page 15: My First Data Science Project (Data Science Thailand Meetup #1)

Step  4:  Tokenization  

4  Using  Pivot  Table  and  count  each  word  

=C4/C$3  =LN(D4)  =B4+1  

•  Add  one  to  everything  •  Calculate  P(Token/APP)  •  Calculate  Log  (P)  

Page 16: My First Data Science Project (Data Science Thailand Meetup #1)

Step  5:  Model  Building  

Amazon  (AMZ)   Others  (OTH)  

5  Building  the  Model  

=IF(LEN(D2)<=3,0,IF(ISNA(VLOOKUP(D2,PropAMZ!$A$4:$E$386,5,FALSE)),LN(1/PropAMZ!$C$3),VLOOKUP(D2,PropAMZ!$A$4:$E$386,5,FALSE)))  

=SUM(D14:AI14)  

=IF(C14>C26,"AMZ","OTHER")  

Page 17: My First Data Science Project (Data Science Thailand Meetup #1)

Step  6:  Using  Model  

Amazon  (AMZ)   Others  (OTH)  

5  Testing  the  Model  

Page 18: My First Data Science Project (Data Science Thailand Meetup #1)
Page 19: My First Data Science Project (Data Science Thailand Meetup #1)

Summary  

*  Step  1:  Identify  Class  (Label)  *  Step  2:  Data  Cleaning  *  Step  3:  Word  Counting  *  Step  4:  Tokenization  *  Step  5:  Model  Building  *  Step  6:  Using  Bayes  and  the  MAP  model  

Page 20: My First Data Science Project (Data Science Thailand Meetup #1)

Other  Solutions?  

www.datascienceth.com  -­‐  R  

www.datascienceth.com  -­‐  Python        www.datascienceth.com  RapidMiner