jinchao demo v7

12
SEARCH YOUR TWEETS SEARCH LIKE A PROFESSIONAL

Upload: jinchao-lin

Post on 16-Apr-2017

204 views

Category:

Technology


0 download

TRANSCRIPT

SEARCH  YOUR  TWEETSSEARCH  LIKE  A  PROFESSIONAL

Motivation

• Twitter  represents  a  rich  flow  of  information• Lack  of  an  effective  way  to  query  the  twitter• Hard  to  monitor  interested  topics  at  real  time

Search  Tweets  Like  a  Professional

A  Real  Time  Twitter  Search  Engine  That  Allows  you  to  Search  based  on:•Keywords◦ Country◦ Language◦Negative  words

Demo(http://searchyourtweet.info:5000/input)

Keep  an  eye  on  your  interested  topic•Express  your  interest,  we  will  keep  you  update  on  the  newest  event•Video  (https://youtu.be/GdRmXNfukos)

Data  pipeline

Query  Controller

Backend  Database

percolator

Logic  Layer Frontend

Searching  database

Data  Backup

Pub/Sub

PublishMatching  query

Register  query

searching

Real  Time  Monitor  on  Twitter◦ Implemented  using  ElasticSearch Percolator◦ Think  it  as  “search  in  reverse”

◦ User  register  queries  into  percolator◦ Percolator  match  incoming  documents  with  registered  queries

◦ Challenge:◦ How  to  design  the  percolator  data  pipeline?◦ How  to  decouple  the  backend  database  with  frontend  server?

◦ Use  publish  /  subscribe  design  pattern

Real  Time  Monitor  Data  Flow

PercolatorQuery  database

Twitter  database

Controller

Pub/Sub subscribe

Open  channel

ChallengeBuild  a  high  throughput  real  time  backend  data  pipeline?• Use  Logstash!

◦ Highly Scalable◦ Compatiblewith  different  sources  and  destination

A  scalable  high   throughput   pipelineCurrent  backend  pipeline

Challenge• Real  time  update  on  frontend  client:• Instead  of  using  “setInterval()”  javascript function,  I  use  “socketIO”  to  keep  socket  open  between  front-­‐end  client  and  flask  server  

• Construct  ElasticSearch query• Use  python  requests  library  to  query  ElasticSearch

• Fine  tuning  on  ElasticSearch

About  MeM.Math,  University  of  Waterloo◦ Field:  Statistics  and  Machine  Learning

B.S.,  University  of  Toronto◦ Field:  Applied  Mathematics

Data  Scientist  Intern,  Neon  Inc.,  San  Francisco

Back-­‐end  Model  Developer,  MetricAid Inc.,  Toronto

Experience  in  Deep  Learning:  ◦ Convolutional  Network,  Recurrent  Network

•OS/161  (a  simplified  POSIX  OS)

Questions?

Thank  you!  

Parallelization  of  percolator• Will  consumes  a  lot  

hardware:  O(mn)

• Another  choice:Luwak +  Samza