slides data donderdag #6
Post on 26-Jul-2015
401 Views
Preview:
TRANSCRIPT
Sandjai Bhulai (s.bhulai@vu.nl)
Challenge #1
• How do you get (all) Dutch tweets?
• Twitter has a streaming API• Fair use policy delivers random 1% of the Twitter
stream• Following keywords is allowed
• How much data do you need?• How much data can you get?• How much data can you deal with?• How much data can you store?
Sandjai Bhulai (s.bhulai@vu.nl)
Challenge #2
• How do you detect trends on Twitter?
• Absolute frequencies of tweets• Relative frequencies of tweets• Speed of tweets• Acceleration of tweets• Seasonal patterns
• We need a real-time algorithm• We need to efficiently handle memory
Sandjai Bhulai (s.bhulai@vu.nl)
Trending topics
1. #PrayforMexico2. #SocialMovies3. #temblor4. Sismo de 7.85. Earthquake in Mexico6. John Elway7. Pat Bowlen8. Marcelo Lagos9. Azcapotzalco10.Niñas de 13 y 14
20 maart 2012, Twitter.com
Sandjai Bhulai (s.bhulai@vu.nl)
Challenge #3
• How do you deal with the following tweets?
• “Brand in Amsterdam”• “Vuur in 020”• “Fikkie in A’dam”
• “Ik heb brand gezien”• “Ik zag brand”• “Ik zie brand”
Sandjai Bhulai (s.bhulai@vu.nl)
The future
• Many challenges ahead:
• How to deal with retweets?• Integration of reputation scores?• Use of profile information?• Advantages of semantic research?• Add feeds of other social media?• Generalize to other languages?• Dependencies of GPS information?• …
top related