Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles

Download Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles

Post on 25-Jul-2015

152 views

Category:

Technology

1 download

Embed Size (px)

TRANSCRIPT

<p> 1. Driving Personalized Experiences Using Customer Profiles Matt Kalan Sr. Solution Architect MongoDB, Inc. @matthewkalan matt.kalan@mongodb.com 2. 2 Big Data Analytics Track 1. Driving Personalized Experiences Using Customer Profiles 2. Leveraging Customer Behavior to Enhance Relevancy in Personalization 3. Machine Learning to Engage the Customer, with Apache Spark, IBM Watson, and MongoDB 3. 3 Agenda For This Session 1.Benefits of Personalization 2.High level process 3.Data capture steps 4.Data analysis steps 5.Real-time personalization 6.Summary 7.Q&amp;A 4. 4 You Notice When Content is Personalized When it looks like this outside Left: from www.johnbyronkuhner.com via Google Images Right: from www.steinmart.com via Google Images Is this the best ad to show you? 5. 5 Or Better This When it looks like this outside Left: from www.johnbyronkuhner.com via Google Images Right: www.linkedin.com/pulse/20140729161519-34678510-take-note-time-to-move-beyond-personalization-to-contextualization More relevant 6. 6 Personalization Pays Conversion Rates 7. 7 Personalization Pays ROI Impact 8. 8 High Level Personalization Process 1. Profile created 2. Enrich with public data 3. Capture activity 4. Clustering analysis 5. Define Personas 6. Tag with personas 7. Personalize interactions Batch analytics Public data Common technologies R Hadoop Spark Python Java Many other options 4 &amp; 5 performed much less often than tagging 9. 9 Why MongoDB for Personalization? Document model =&gt; customer profiles are rich structures perfect for documents High throughput =&gt; profiles are read/written every page so high performance is critical High scalability =&gt; high performance must scale easily for any data size &amp; request volume Rich querying &amp; indexes =&gt; often only portions of the profile are queried for and especially ad hoc marketing requires rich querying capabilities. Geospatial indexes critical for mobile Real-time analytics =&gt; can analyze directly on MongoDB or prepare aggregated results for external analysis with the aggregation framework Strong consistency =&gt; want profile changes &amp; tracking to take effect immediately Hadoop/Spark integration =&gt; can run distributed analytics on data in MongoDB or copy it to HDFS to run there both with the MongoDB Hadoop Connector Low TCO =&gt; Low cost enterprise software license, commodity hardware, &amp; management 10. 10 Customer Example: Scratchpad Records all activity in researched trips Needed Document model Dynamic schema Rich querying Easy scaling 11. 11 And Many Other Customers Personalizing with MongoDB Sailthru Sitecore Adobe (AEM) Expedia ADP Foursquare Otto Chicos and 100s more 12. Data Capture 13. 13 Anonymous user Might just start with this if no cookie { "ipAddress" : "216.58.219.238", "referrer" : "google.com" } Pretty useless, right? 14. 14 More Than Just What You Collect IP Address Referrer Information Broker Location Company Weather Avg Income Interests Possible Interests e.g. Kay Jewelers, Dicks Sporting Goods Budget Indication e.g. Barneys Search term 15. 15 Often User Creates a Profile { "_id" : ObjectId("553ea57b588ac9ef066428e1"), "ipAddress" : "216.58.219.238", "referrer" : kay.com", "firstName" : "John", "lastName" : "Doe", "email" : "johndoe@gmail.com" } 16. 16 Even Email Unlocks Useful Info 17. 17 Available Early in Relationship { "_id" : ObjectId("553e7dca588ac9ef066428e0"), "firstName : "John", "lastName : "Doe", "address : "229 W. 43rd St.", "city : "New York", "state" : "NY", "zipCode" : "10036", "age" : 30, "email" : "johndoe@gmail", "gender" : "male } 18. 18 Often Users Even Volunteer Preferences 19. 19 Easy to Store in Profile { "_id" : ObjectId("553e7dca588ac9ef066428e0"), "firstName : "John", "lastName : "Doe", "address : "229 W. 43rd St.", "city : "New York", "state" : "NY", "zipCode" : "10036", "age" : 30, "email" : "johndoe@gmail.com", "gender" : "male, "interests" : [ dumplings", board games", rooftop", ginger beer", ahi tuna", healthy food" ] } 20. 20 In Return, User Gets Relevant Info 21. 21 Customer Activity Valuable to Track { "_id: ObjectId("553e7dca588ac9ef066428e0"), "firstName : "John", "lastName : "Doe", "address : "229 W. 43rd St.", "city : "New York", "state" : "NY", "zipCode" : "10036", "age" : 30, "email" : "johndoe@gmail.com", "gender" : "male, ... "visitedCounts" : { "watches" : 3, "shirts" : 1, "sunglasses" : 1, "bags" : 2 } } From gilt.com 22. 22 Purchases Are Usually Even More Valuable { "_id: ObjectId("553e7dca588ac9ef066428e0"), "firstName : "John", "lastName : "Doe", "address : "229 W. 43rd St.", "city : "New York", "state" : "NY", "zipCode" : "10036", "age" : 30, "email" : "johndoe@gmail.com", "gender" : "male, ... "purchases" : [ { "id" : 1, "desc" : "Power Oxford Dress Shoe", "category" : "Mens shoes" }, { "id" : 2, "desc" : "Striped Sportshirt", "category" : "Mens shirts" } ] } From gilt.com 23. 23 Data Capture Simple to Sophisticated { "_id" : ObjectId("553e7dca588ac9ef066428e0"), "firstName" : "John", "lastName" : "Doe", "address" : "229 W. 43rd St.", "city" : "New York", "state" : "NY", "zipCode" : "10036", "age" : 30, "email" : "john.doe@mongodb.com", "twitterHandle" : "johndoe", "gender" : "male", "interests" : [ "electronics", "basketball", "weightlifting", "ultimate frisbee", "traveling", "technology" ], "visitedCounts" : { "watches" : 3, "shirts" : 1, "sunglasses" : 1, "bags" : 2 }, "purchases" : [ { "id" : 1, "desc" : "Power Oxford Dress Shoe", "category" : "Mens shoes" }, { "id" : 2, "desc" : "Striped Sportshirt", "category" : "Mens shirts" } ] } Additional behavior tracking How long on each page (e.g. publishing)? What is reaction to pop-up promotions? Looks at cross-sold items on page? What categories are clicked on? Does a certain price point drive buying? Purchases at certain times of year? 24. Data Analysis 25. 25 Clustering Overview Think of each of your customers or users of your site as a data point How can we group users into like sets for marketing, cross-sell, etc. similarly K-means is a common algorithm for clustering Image from: http://pypr.sourceforge.net/kmeans.html Clustered DataOriginal Unclustered Data 26. 26 Clustering Process for Personalization Customer Profile Documents Map to Vectors [1, 3, 0, ] Clustering Algo Vectors Iterate on inputs Define Personas Clusters of customersUpdate profiles with persona Tag Profiles with Personas Clusters of customers 27. 27 Mapping Profile to Vector Input { "_id" : ObjectId("553e7dca588ac9ef066428e0"), "firstName" : "John", ... "visitedCounts" : { Mens watches" : 3, Mens shirts" : 1, Mens sunglasses" : 1, Mens bags" : 2 }, "purchases" : [ { "id" : 1, "desc" : "Power Oxford Dress Shoe", "category" : "Mens shoes" }, { "id" : 2, "desc" : "Striped Sportshirt", "category" : "Mens shirts" } ] } Mens shirts Mens pants Mens shoes Mens ties Mens Sunglass Mens Watch 11 0 10 0 1 3 [ 11, 0, 10, 0, 1, 3, ...] (example vector) e.g. 1 purchase = 10 visited counts 28. 28 Aggregation Framework for Filtering Profiles //Adds up the visited counts (vc) and purchases to filter out those below 20 counts db.profiles.aggregate( [ {$project: { vc: "$vc", purchases: "$purchases", total: {$add: [ {$ifNull: ["$vc.mShirts", 0]}, {$ifNull: ["$vc.mPants", 0]}, {$ifNull: ["$vc.mShoes", 0]}, {$ifNull: ["$vc.mTies", 0]}, {$ifNull: ["$vc.mSunglass", 0]}, {$ifNull: ["$vc.mWatch", 0]}, {$ifNull: ["$vc.mBags", 0]}, {$multiply: [ {$size: "$purchases"}, 10 ]} ]} } }, {$match: {total: {$gte: 20}} } ]) 29. 29 Input/Output for K-Means Algo Clustering Algo Iterate on inputs Clusters of customers Vectors: [ [11, 0, 10, 0, 1, 3, ...], [ 0, 5, 10, 3, 0, 0, ...], ... ] K = # of clusters Driven by marketing effort or data analysis N = # of iterations { Centers: [ {name: C1, vector:[..] }, {name: C2, vector:[..] }], ... ] Clusters: [ {C1: [[11, 0, 10, 0, 1, 3, ...],...]}, {C2: [[ 0, 5, 0, 0, 10, 0, ...],...]}, ... ] } Vectors 30. 30 Clustered DataOriginal Unclustered Data Choosing Personas Each cluster would usually map to one persona you can identify, name, and target Common to name personas to be memorable, e.g. shoe fanatic, bargain hunter, researcher, etc. C1 C2 C3 Shoe Fanatic? 31. 31 Mapping Customer Profile to Persona { Centers: [ {name: C1, vector:[..] }, {name: C2, vector:[..] }], ... ] Clusters: [ {C1: [[11, 0, 10, 0, 1, 3, ...],...]}, {C2: [[ 0, 5, 0, 0, 10, 0, ...],...]}, ... ] } { "_id" : ObjectId("553e7dca588ac9ef066428e0"), "firstName" : "John", ... "visitedCounts" : { Mens watches" : 3, Mens shirts" : 1, Mens sunglasses" : 1, Mens bags" : 2 }, "purchases" : [ { "id" : 1, "desc" : "Power Oxford Dress Shoe", "category" : "Mens shoes" }, { "id" : 2, "desc" : "Striped Sportshirt", "category" : "Mens shirts" } ], "persona" : "shoe-fanatic" } Loop through each vector in cluster, map to customer, and tag customer with persona </p>