image management at ebates

12
Rajiv Gupta Director, Technology Operations Ebates Inc. Image Management at Ebates

Upload: rakuten-inc

Post on 12-Apr-2017

26 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Image Management at Ebates

Rajiv Gupta

Director, Technology Operations

Ebates Inc.

Image Management at Ebates

Page 2: Image Management at Ebates

Ebates – What we do

Page 3: Image Management at Ebates

3

A Rakuten Open EC Company

Page 4: Image Management at Ebates

• Founded in 1999, Ebates has over 17 years of member shopping data

• Part of Rakuten Inc. since Oct. 2014

• From Thanksgiving to Cyber Monday in 2015*, Ebates members were responsible for 5.5% of e-commerce in the US

• Smartphone adoption and sales continue to grow at exponential rates with over 4 million App downloads thus far

Ebates is the leading and fastest growing shopper loyalty program

Page 5: Image Management at Ebates

• GOAL: Build a product catalog search capability on Ebates platform

• Offer Ebates members product centric experience in addition to store centric experience.

• Scale: 1,000+ Merchants, ~300M product listings, ~1.2 Billion images

New initiative: Product centric shopping experience

Page 6: Image Management at Ebates

Sample Screenshots

Page 7: Image Management at Ebates

High level technical architecture for Product Catalog Image Storage

Web Scraping

Data Feeds from

Merchant

Temp NFS LocationWeb Scraping

Web Server Farm

CDN

• Spark processing

• Kafka Queuing

• Cassandra storage

• Categorization

NFS Storage for Images

Focus on Image storage in this presentation

Page 8: Image Management at Ebates

• Scale out solution that grows with the use case

• Soft launch first, A/B testing

• Based on success, mass roll out.

• Billion plus small image files – very high density

• Highly reliable and available storage system

• Image storage solution must support replication capabilities for Disaster Recovery scenarios

• Performance to support both back-end image processing and serve image to Ebates members via the website

Image storage solution technical requirements

Page 9: Image Management at Ebates

Scale out image storage using

• Key Concepts:

• Scale out storage that runs on general purpose commodity Intel

hardware

• MapR organizes disks in each of the server into storage pools to enable

striping for performance.

• Volume in MapR comprises of NameContainer and DataContainer

• Containers are stored on storage pools and are a unit of reliability

• No limit on number of volumes in the MapR cluster

• No limit on number of files in MapR cluster

Page 10: Image Management at Ebates

Deployment model

MapR cluster Nodes

WebServer• Download product / images from merchant sites• Apply categorization and business logic/rules• Resize product images to 4 different format for

consumption by different device form factor

Webserver to serve images

NFS mountNFS mount

CDN

WebServer WebServer

2 CPU, 128GB Memory, 12 X 4TB HDD

Backend Processing

Front End Image Serving

Page 11: Image Management at Ebates

Lessons Learnt

• Very easy to implement MapR

NFS cluster

• Solution from start to finish

is up in less than 3 days

• Self sustaining; limited

care and feed

• Avg read response to retrieve

images from cluster with

several hundred million files is

only a few milliseconds.

• Distribute the image files within

MapR cluster

• One volume per merchant,

multiple folders and sub folders

within volume – deep folder/file

structure

• Name container in volume is a

performance bottleneck if

average file size is < 64K. Watch

out for this!

• Performance for back-end image

processing can be a bottleneck when

using SATA drives

• Can be mitigated by adding more

drives OR faster drives.

Page 12: Image Management at Ebates

Q&A