image management at ebates
TRANSCRIPT
Rajiv Gupta
Director, Technology Operations
Ebates Inc.
Image Management at Ebates
Ebates – What we do
3
A Rakuten Open EC Company
• Founded in 1999, Ebates has over 17 years of member shopping data
• Part of Rakuten Inc. since Oct. 2014
• From Thanksgiving to Cyber Monday in 2015*, Ebates members were responsible for 5.5% of e-commerce in the US
• Smartphone adoption and sales continue to grow at exponential rates with over 4 million App downloads thus far
Ebates is the leading and fastest growing shopper loyalty program
• GOAL: Build a product catalog search capability on Ebates platform
• Offer Ebates members product centric experience in addition to store centric experience.
• Scale: 1,000+ Merchants, ~300M product listings, ~1.2 Billion images
New initiative: Product centric shopping experience
Sample Screenshots
High level technical architecture for Product Catalog Image Storage
Web Scraping
Data Feeds from
Merchant
Temp NFS LocationWeb Scraping
Web Server Farm
CDN
• Spark processing
• Kafka Queuing
• Cassandra storage
• Categorization
NFS Storage for Images
Focus on Image storage in this presentation
• Scale out solution that grows with the use case
• Soft launch first, A/B testing
• Based on success, mass roll out.
• Billion plus small image files – very high density
• Highly reliable and available storage system
• Image storage solution must support replication capabilities for Disaster Recovery scenarios
• Performance to support both back-end image processing and serve image to Ebates members via the website
Image storage solution technical requirements
Scale out image storage using
• Key Concepts:
• Scale out storage that runs on general purpose commodity Intel
hardware
• MapR organizes disks in each of the server into storage pools to enable
striping for performance.
• Volume in MapR comprises of NameContainer and DataContainer
• Containers are stored on storage pools and are a unit of reliability
• No limit on number of volumes in the MapR cluster
• No limit on number of files in MapR cluster
Deployment model
MapR cluster Nodes
WebServer• Download product / images from merchant sites• Apply categorization and business logic/rules• Resize product images to 4 different format for
consumption by different device form factor
Webserver to serve images
NFS mountNFS mount
CDN
WebServer WebServer
2 CPU, 128GB Memory, 12 X 4TB HDD
Backend Processing
Front End Image Serving
Lessons Learnt
• Very easy to implement MapR
NFS cluster
• Solution from start to finish
is up in less than 3 days
• Self sustaining; limited
care and feed
• Avg read response to retrieve
images from cluster with
several hundred million files is
only a few milliseconds.
• Distribute the image files within
MapR cluster
• One volume per merchant,
multiple folders and sub folders
within volume – deep folder/file
structure
• Name container in volume is a
performance bottleneck if
average file size is < 64K. Watch
out for this!
• Performance for back-end image
processing can be a bottleneck when
using SATA drives
• Can be mitigated by adding more
drives OR faster drives.
Q&A