insight dataengineering henok_yelpdemo
TRANSCRIPT
![Page 1: Insight dataengineering henok_yelpdemo](https://reader031.vdocuments.net/reader031/viewer/2022030308/58eca1fc1a28ab23278b4715/html5/thumbnails/1.jpg)
Where is my tweet?Henok Mengistu
Insight Data Engineering Fellow
Silicon Valley, Summer 2016
![Page 2: Insight dataengineering henok_yelpdemo](https://reader031.vdocuments.net/reader031/viewer/2022030308/58eca1fc1a28ab23278b4715/html5/thumbnails/2.jpg)
Motivation
![Page 3: Insight dataengineering henok_yelpdemo](https://reader031.vdocuments.net/reader031/viewer/2022030308/58eca1fc1a28ab23278b4715/html5/thumbnails/3.jpg)
Motivation
But, this number doesn't show how the tweet spreads-out?
![Page 4: Insight dataengineering henok_yelpdemo](https://reader031.vdocuments.net/reader031/viewer/2022030308/58eca1fc1a28ab23278b4715/html5/thumbnails/4.jpg)
But, a re-tweet graph could show
![Page 6: Insight dataengineering henok_yelpdemo](https://reader031.vdocuments.net/reader031/viewer/2022030308/58eca1fc1a28ab23278b4715/html5/thumbnails/6.jpg)
Under the hood
![Page 7: Insight dataengineering henok_yelpdemo](https://reader031.vdocuments.net/reader031/viewer/2022030308/58eca1fc1a28ab23278b4715/html5/thumbnails/7.jpg)
Engineering Challenges
Re-tweets could arrive out of order– Spark can't sort across a data stream
– Apache Flink
![Page 8: Insight dataengineering henok_yelpdemo](https://reader031.vdocuments.net/reader031/viewer/2022030308/58eca1fc1a28ab23278b4715/html5/thumbnails/8.jpg)
● I am Henok– Originally, from Ethiopia
– Currently, a PhD student at the University of Wyoming
● Working on Evolutionary Computation
– I like playing and watching Soccer
– But skiing, not so much so
![Page 9: Insight dataengineering henok_yelpdemo](https://reader031.vdocuments.net/reader031/viewer/2022030308/58eca1fc1a28ab23278b4715/html5/thumbnails/9.jpg)
Thank you!
![Page 10: Insight dataengineering henok_yelpdemo](https://reader031.vdocuments.net/reader031/viewer/2022030308/58eca1fc1a28ab23278b4715/html5/thumbnails/10.jpg)
Queries
● On the re-tweet graph
– who are my audiences? ● Geographically, social groups
– Betweenness centrality ● Who is relevant to spread out my tweet?● Identify influential followers