farrot - filter amazon review ratings over time

12
FARROT: Filter Amazon Review Ratings Over Time Andy Lai

Upload: altens123

Post on 29-Jul-2015

171 views

Category:

Software


1 download

TRANSCRIPT

FARROT:Filter Amazon Review

Ratings Over Time

Andy Lai

ProblemAmazon doesn't allow filtering review ratings and totals by state or time

http://youtu.be/w78X0IpjI5c

UI DEMO

http://youtu.be/w78X0IpjI5c

Data setStanford SNAP Amazon reviews

35GB35M reviews

University of Illinois Amazon member info142MBMember location information

joeme 92 5/26 Cleveland, OH United States Joseph M. Kotow B00006HAXWOH

Pipeline

ImportTsv

SNAP REVIEWS in 10 rows per review

UIC MEMBERLOCATIONTSV HappyBase

Pipeline

ImportTsv

SNAP REVIEWS in 10 rows per review

UIC MEMBERLOCATIONTSV HappyBase

Pipeline

ImportTsv

SNAP REVIEWS in 10 rows per review

UIC MEMBERLOCATIONTSV HappyBaseB00006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDE9-N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES I have all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…

Pipeline

ImportTsv

SNAP REVIEWS in 10 rows per review

UIC MEMBERLOCATIONTSV HappyBaseB00006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDE9-N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES I have all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…

PIG to CLEAN, JOIN and AGGREGATE rating reviews and totals

Pipeline

ImportTsv

SNAP REVIEWS in 10 rows per review

UIC MEMBERLOCATIONTSV HappyBase

HBase SchemaTable Schemas:

PRODUCTID_STATE, TOTAL REVIEWS, AVG RATING

PRODUCTID_STATE_BYYEAR_EPOCH, TOTAL REVIEWS, AVG RATING

PRODUCTID_STATE_BYMONTH_EPOCH, TOTAL REVIEWS, AVG RATING

PRODUCTID_STATE_BYDAY_EPOCH, TOTAL REVIEWS, AVG RATING

•Example: B00003CWT6_CA_BYMONTH_1008115200000

RetrospectiveDesign Considerations• HBase was used for optimizations for reads, range scans, and scalability • Data was bucketed by state and different time intervals for query performance by avoiding the cost of recalculating aggregates at the expense of storage• Java MR was used to convert multi-row reviews to tabular format Future• Scrape Amazon for new reviews• Filter and display reviews

About me – Andy Lai UC Berkeley (B.S. Electrical

Engineering & Computer Science) SJSU (M.S. Engineering) Software Engineer (DB2, Relational

database) Interests: