farrot insight

12
FARROT: Filter Amazon Review Ratings Over Time Andy Lai

Upload: altens123

Post on 20-Jun-2015

122 views

Category:

Software


0 download

DESCRIPTION

farrot

TRANSCRIPT

Page 1: Farrot insight

FARROT:Filter Amazon Review

Ratings Over Time

Andy Lai

Page 2: Farrot insight

ProblemAmazon doesn't allow f iltering review ratings and totals by state or time

http://youtu.be/w78X0IpjI5c

Page 3: Farrot insight

UI DEMO

http://youtu.be/w78X0IpjI5c

Page 4: Farrot insight

Data setStanford SNAP Amazon reviews

35GB35M reviews

University of Illinois Amazon memberinfo

142MBMember location informationjoeme 92 5/26 Cleveland, OH United States Joseph M. Kotow B00006HAXW

OH

Page 5: Farrot insight

Pipeline

ImportTsv

SNAP REVIEWS in 10 rows per review

UIC MEMBERLOCATIONTSV HappyBase

Page 6: Farrot insight

Pipeline

ImportTsv

SNAP REVIEWS in 10 rows per review

UIC MEMBERLOCATIONTSV HappyBase

Page 7: Farrot insight

Pipeline

ImportTsv

SNAP REVIEWS in 10 rows per review

UIC MEMBERLOCATIONTSV HappyBaseB00006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDE9-N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES Ihave all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…

Page 8: Farrot insight

Pipeline

ImportTsv

SNAP REVIEWS in 10 rows per review

UIC MEMBERLOCATIONTSV HappyBaseB00006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDE9-N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES Ihave all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…

PIG to CLEAN,JOIN andAGGREGATErating reviews andtotals

Page 9: Farrot insight

Pipeline

ImportTsv

SNAP REVIEWS in 10 rows per review

UIC MEMBERLOCATIONTSV HappyBase

Page 10: Farrot insight

HBase SchemaTable Schemas:

PRODUCTID_STATE,TOTAL REVIEWS, AVG RATING

PRODUCTID_STATE_BYYEAR_EPOCH, TOTAL REVIEWS, AVG RATING

PRODUCTID_STATE_BYMONTH_EPOCH,TOTAL REVIEWS, AVG RATING

PRODUCTID_STATE_BYDAY_EPOCH,TOTAL REVIEWS, AVG RATING

•Example:B00003CWT6_CA_BYMONTH_1008115200000

Page 11: Farrot insight

RetrospectiveDesign Considerations• HBase was used for optimizations for reads, range scans, and scalability • Data was bucketed by state and different time intervals for query performance by avoiding the cost of recalculating aggregates at the expense of storage• Java MR was used to convert multi-row reviews to tabular format Future• Scrape Amazon for new reviews• Filter and display reviews

Page 12: Farrot insight

About me – Andy Lai UC Berkeley (B.S. Electrical Engineering

& Computer Science) SJSU (M.S. Engineering) Software Engineer (DB2, Relational

database) Interests: