operational aspects of big data
DESCRIPTION
The presentaion discuss what is big data and best practices for operating big data operation The speaker is the CTO of myThings. It was presented in June 10, 2014 conference "Best practices for SaaS Operaion" sponsered by MoovingON www.moovingon.comTRANSCRIPT
Operational aspects of Big Data
• Yoav Chernobroda - CTO
All copyrights reserved to myThings LTD
Few facts about myThings
Online ad retargeting for desktop, mobile
web and mobile apps
First on Fast 50 companies
150 employees
R&D in Ramat Hachayal + 15 regional
offices
Big data at a scale
20TB / day
300M uniques /month
All copyrights reserved to myThings LTD
Personalized retargeting
She is tagged with myThings’
smart tag, browse site but leave
without completing purchase
When she later visits any
desktop or mobile site on the ad
network, she is targeted with an
ad
User visits e-commerce site but
quits without converting1 2
myThings creates, in real time,
a personalized ad– custom-made
based on consumer intent data,
with product info, image
3A personalized ad is
presented4When user clicks she is
taken back to product
page to complete
purchase
5
All copyrights reserved to myThings LTD
RTB retargeting
e-commerce
site
RTB
Exchange
Google ad exchange
Consumer DB
Media Service
Tag Service
RTB service
myThings platform
Content site
Visits
Reads
All copyrights reserved to myThings LTD
The big data challenge
All copyrights reserved to myThings LTD
The (sad) truth
All copyrights reserved to myThings LTD
Big data is not about large data volumes
All copyrights reserved to myThings LTD
Classic definition
The 3 V’s• Volume (terra / peta / zeta / … bytes)
• Variety• The relational model does not hold
• Velocity• Traditional relational db are not scalable enough
• Technology is built around linear scalability
• Examples:– Predictive analytics
– Recommendation engines
– Customer retention, churn analysis
– Social graph analysis
– Fraud detection
All copyrights reserved to myThings LTD
My definition
Big data
Operational view
Business intel.
view
Predictive
modeling view
Real time
decisions
All copyrights reserved to myThings LTD
The big data challenge
Business value– Do we solve the right problem?
– How does it help our business?Data quality– Do we have the right data?
Organization roles– Collaboration
Culture– Process oriented vs. iterative exploratory
– Organizational fitOperational and infrastructure– Will get to it in a moment …
All copyrights reserved to myThings LTD
myThings big data architecture
All copyrights reserved to myThings LTD
Operational challenges
• Cost effective architecture
• Real time vs. near RT vs. offline processing
• Linear scalability
• Data routing infrastructure• Data retention and backup• Open source components
• Hadoop, Kafka, Storm, Cassandra, …
• Cost monitoring• Skillful devops – the human factor
All copyrights reserved to myThings LTD
Recommended reading
Nathan Marz
Originator of Storm and
Cascalog
The lambda architecture