farrot insight
DESCRIPTION
farrotTRANSCRIPT
FARROT:Filter Amazon Review
Ratings Over Time
Andy Lai
ProblemAmazon doesn't allow f iltering review ratings and totals by state or time
http://youtu.be/w78X0IpjI5c
UI DEMO
http://youtu.be/w78X0IpjI5c
Data setStanford SNAP Amazon reviews
35GB35M reviews
University of Illinois Amazon memberinfo
142MBMember location informationjoeme 92 5/26 Cleveland, OH United States Joseph M. Kotow B00006HAXW
OH
Pipeline
ImportTsv
SNAP REVIEWS in 10 rows per review
UIC MEMBERLOCATIONTSV HappyBase
Pipeline
ImportTsv
SNAP REVIEWS in 10 rows per review
UIC MEMBERLOCATIONTSV HappyBase
Pipeline
ImportTsv
SNAP REVIEWS in 10 rows per review
UIC MEMBERLOCATIONTSV HappyBaseB00006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDE9-N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES Ihave all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…
Pipeline
ImportTsv
SNAP REVIEWS in 10 rows per review
UIC MEMBERLOCATIONTSV HappyBaseB00006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDE9-N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES Ihave all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…
PIG to CLEAN,JOIN andAGGREGATErating reviews andtotals
Pipeline
ImportTsv
SNAP REVIEWS in 10 rows per review
UIC MEMBERLOCATIONTSV HappyBase
HBase SchemaTable Schemas:
PRODUCTID_STATE,TOTAL REVIEWS, AVG RATING
PRODUCTID_STATE_BYYEAR_EPOCH, TOTAL REVIEWS, AVG RATING
PRODUCTID_STATE_BYMONTH_EPOCH,TOTAL REVIEWS, AVG RATING
PRODUCTID_STATE_BYDAY_EPOCH,TOTAL REVIEWS, AVG RATING
•Example:B00003CWT6_CA_BYMONTH_1008115200000
RetrospectiveDesign Considerations• HBase was used for optimizations for reads, range scans, and scalability • Data was bucketed by state and different time intervals for query performance by avoiding the cost of recalculating aggregates at the expense of storage• Java MR was used to convert multi-row reviews to tabular format Future• Scrape Amazon for new reviews• Filter and display reviews
About me – Andy Lai UC Berkeley (B.S. Electrical Engineering
& Computer Science) SJSU (M.S. Engineering) Software Engineer (DB2, Relational
database) Interests: