bootstrapping recommendations with neo4j
Post on 08-Jan-2017
4.001 Views
Preview:
TRANSCRIPT
Bootstrapping Recommendations with Neo4j
Big Data TechCon
About Me
• Max De Marzi -‐ Neo4j Field Engineer
• My Blog: http://maxdemarzi.com • Find me on Twitter: @maxdemarzi • Email me: maxdemarzi@gmail.com • GitHub: http://github.com/maxdemarzi
Big Data -‐ What is it good for?
• Absolutely Nothing!
• Benchmarks Is this performing better then that? Yes, why? Uh. • Recommendations You should buy this right now. • Predictions You will probably buy this.
Top 10 Recommendations
• PopularityThe naive approach One size fits most
Naive Approach
I’m getting little Timmy some “Cards Against Humanity”
Content Based Recommendations
• Step 1: Collect Item Characteristics • Step 2: Find similar Items • Step 3: Recommend Similar Items
• Example: Similar Movie Genres
There is more to life than Romantic Zombie-‐coms
Collaborative Filtering Recommendations
• Step 1: Collect User Behavior • Step 2: Find similar Users • Step 3: Recommend Behavior taken by similar users
• Example: People with similar musical tastes
You are so original!
Using Relationships for Recommendations
Content-‐based filtering Recommend items based on what users have liked in the past
Collaborative filtering Predict what users like based on the similarity of their behaviors, activities and preferences to others
Movie
Person
Person
RATED
SIMILARITY
rating: 7
value: .92
Hybrid Recommendations
• Combine the two for better results
• Like Peanut Butter and Jelly
Benefits of Real-‐Time Recommendations
Online Retail • Suggest related products and services • Increase revenue and engagement
Media and Broadcasting • Create an engaging experience • Produce personalized content and offers
Logistics • Recommend optimal routes • Increase network efficiency
Challenges for Real-‐Time Recommendations
Make effective real-‐time recommendations • Timing is everything in point-‐of-‐touch applications • Base recommendations on current data, not last night’s batch load
Process large amounts of data and relationships for context • Relevance is king: Make the right connections • Drive traffic: Get users to do more with your application
Accommodate new data and relationships continuously • Systems get richer with new data and relationships • Recommendations become more relevant
Relational vs. Graph Models
Relational Model Graph Model
RATED
RATED
RATED
MAX
Person MovieRatings
MAXTerminator
Toy Story
Titanic
Cypher Query Language
MATCH (:Person { name:“Dan”} ) -‐[:KNOWS]-‐> (:Person { name:“Ann”} )
KNOWS
Dan Ann
Label Property Label Property
Node Node
MATCH (boss)-‐[:MANAGES*0..3]-‐>(sub), (sub)-‐[:MANAGES*1..3]-‐>(report) WHERE boss.name = “John Doe” RETURN sub.name AS Subordinate, count(report) AS Total
Express Complex Queries Easily with Cypher
Find all direct reports and how many people they manage,
up to 3 levels down
Cypher QuerySQL Query
Hello World Recommendation
Movie Data Model
Cypher Query: Movie Recommendation
MATCH (watched:Movie {title:"Toy Story”}) <-‐[r1:RATED]-‐ () -‐[r2:RATED]-‐> (unseen:Movie) WHERE r1.rating > 7 AND r2.rating > 7 AND watched.genres = unseen.genres AND NOT( (:Person {username:”maxdemarzi"}) -‐[:RATED|WATCHED]-‐> (unseen) ) RETURN unseen.title, COUNT(*) ORDER BY COUNT(*) DESC LIMIT 25
What are the Top 25 Movies • that I haven't seen • with the same genres as Toy Story • given high ratings • by people who liked Toy Story
Let’s try k-‐nearest neighbors (k-‐NN)
Cosine Similarity
Cypher Query: Ratings of Two Users
MATCH (p1:Person {name:'Michael Sherman’}) -‐[r1:RATED]-‐> (m:Movie), (p2:Person {name:'Michael Hunger’}) -‐[r2:RATED]-‐> (m:Movie) RETURN m.name AS Movie, r1.rating AS `M. Sherman's Rating`, r2.rating AS `M. Hunger's Rating`
What are the Movies these 2 users have both rated
Cypher Query: Ratings of Two UsersCalculating Cosine Similarity
Cypher Query: Cosine Similarity
MATCH (p1:Person) -‐[x:RATED]-‐> (m:Movie) <-‐[y:RATED]-‐ (p2:Person) WITH SUM(x.rating * y.rating) AS xyDotProduct, SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS xLength, SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS yLength, p1, p2 MERGE (p1)-‐[s:SIMILARITY]-‐(p2) SET s.similarity = xyDotProduct / (xLength * yLength)
Calculate it for all Person nodes with at least one Movie between them
Movie Data Model
Cypher Query: Your nearest neighbors
MATCH (p1:Person {name:'Grace Andrews’}) -‐[s:SIMILARITY]-‐ (p2:Person) WITH p2, s.score AS sim ORDER BY sim DESC LIMIT 5 RETURN p2.name AS Neighbor, sim AS Similarity
Who are the • top 5 Persons and their similarity score • ordered by similarity in descending order • for Grace Andrews
Your nearest neighbors
Cypher Query: k-‐NN Recommendation
MATCH (m:Movie) <-‐[r:RATED]-‐ (b:Person) -‐[s:SIMILARITY]-‐ (p:Person {name:'Zoltan Varju'}) WHERE NOT( (p) -‐[:RATED]-‐> (m) ) WITH m, s.similarity AS similarity, r.rating AS rating ORDER BY m.name, similarity DESC WITH m.name AS movie, COLLECT(rating)[0..3] AS ratings WITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS recommendation ORDER BY recommendation DESC RETURN movie, recommendation LIMIT 25
What are the Top 25 Movies • that Zoltan Varju has not seen • using the average rating • by my top 3 neighbors
Recommendations over Searching/Browsing
Recommend Jobs to Job SeekersWhat connects them? • location • skills • education • experience
Cypher Query: Job RecommendationWhat are the Top 10 Jobs for me • that are in the same location I’m in • for which I have the necessary qualifications
Job Recommendation ResultsPerfect Candidate for 100% matches • missing qualifications can be added quickly • might encourage exaggerated resumes
Just one tiny itsy bitsy problem
Job Boards get paid by • Number of Applicants to a Job • Wholesale Resume sales • Selling your data
Recommend LoveFind your soulmate in the graph • Are they energetic? • Do they like dogs? • Have a good sense of humor? • Neat and tidy, but not crazy about it?
What are the Top 10 Potential Mates for me • that are in the same location • are sexually compatible • have traits I want • want traits I have
Cypher Query: Love Recommendation
Love Recommendation Results
Linked Data
Connect to the Semantic Web
Getting some Data
neo4j-‐dbpedia-‐importer
https://github.com/kbastani/neo4j-‐dbpedia-‐importer
Named Entity RecognitionAutomatically find • names of people • place and locations • products • and organizations
Hacker News for Example
• What are the kids in silicon valley talking about?
Let’s find out
• They have an API! • Get some data:StoriesUsersAuthors Commenters
Data Model
Hacker News Recommendations
• Which stories should I read? • Which users should I follow? • What else should I be interested in? • Who seems to know a lot about X? • Etc.
GraphAware Recommendation Framework
• Ability to trade off recommendation quality for speed • Ability to pre-‐compute recommendations • Built-‐in algorithms and functions • Ability to measure recommendation quality • Ability to easily run in A/B test environments
Real-‐Time Recommendations with Neo4j
SocialRecommendations
Products and Services Content Routing
Walmart BUSINESS CASE
World’s largest companyby revenue
World’s largest retailer and private employer
SF-‐based global e-‐commerce division
manages several websites
Found in 1969Bentonville, Arkansas
• Needed online customer recommendations to keep pace with competition
• Data connections provided predictive context, but were not in a usable format
• Solution had to serve many millions of customers and products while maintaining superior scalability and performance
Walmart SOLUTION
• Brings customers, preferences, purchases, products and locations into a graph model
• Uses connections to make product recommendations
• Solution deployed across WalMart divisions and websites
Global Courier BUSINESS CASE
World’s largest courier
480,000 employees€55 billion in revenue
Needed new B2C and B2B parcel routing
system for its logistics practice
Legacy system neither supported the full network
nor the shift to online demands
Needed to replace aging B2B and B2C parcel routing system whose requirements include: • 24x7 availability • Peak loads of 5M parcels per day, 3K per second • Support for complex and diverse software stack • Predictable performance with linear scalability • Daily changes to logistics networks • Route from any point to any point • Single point of truth for entire network
Global Courier SOLUTION
Neo4j provides the ideal domain fit since a logistics network is a graph • High availability and performance via Neo4j clustering
• Greatly simplified Cypher queries for routing versus relational SQL queries
• Flexible data model that reflects the real logistics world far better than relational
• Easy-‐to-‐grasp whiteboard-‐friendly model
eBay BUSINESS CASE
C2C and B2C retail network
Full e-‐commerce functionality for individuals
and businesses
Integrated with logistics vendors for product
deliveries
• Needed an offering to compete with Amazon Prime
• Enable customer-‐selected delivery inside 90 minutes
• Calculate best route option in real-‐time • Scale to enable a variety of services • Offer more predictable delivery times
eBay Now SOLUTION
• Acquired UK-‐based Shutl. a leader in same-‐day delivery
• Used Neo4j to create eBay Now • 1000 times faster than the prior MySQL-‐based solution
• Faster time-‐to-‐market • Improved code quality with 10 to 100 times less query code
Classmates BUSINESS CASE
Online yearbook connecting friends from school, work and military
in US and Canada
Founded as Memory Lane in Seattle
Develop new social networking capabilities to monetize yearbook-‐related offerings • Show all the people I know in a yearbook • Show yearbooks my friends appear in most often • Show sections of a yearbook that my friends appear most in
• Show me other schools my friends attended
Classmates SOLUTION
Neo4j provides a robust and scalable graph database solution • 3-‐instance cluster with cache sharding and disaster-‐recovery
• 18ms response time for top 4 queries • 100M nodes and 600M relationships in initial graph—including people, images, schools, yearbooks and pages
• Projected to grow to 1B nodes and 6B relationships
National Geographic BUSINESS CASE
Non-‐profit scientific and educational institution
founded in 1888
Covers geography, archaeology, natural science, environment and historical
conservation
Journals, online media, radio, TV, documentaries, live events and consumer
content and goods
• Improve poor performance of PostgreSQL app • Increase user engagement by linking to 100+ years of multimedia content
• Improve targeting by understand subscribers’ interests better
• Recommend content and services to users based on their interests
National Geographic SOLUTION
• Enabled complex real-‐time analytics across eight million users and a century of content
• Delivered robust performance by eliminating triple-‐nested SQL joins
• Cross-‐refers users among content, live events, travel, goods and causes
• Neo4j solution much less cumbersome and easier to maintain than previous SQL system
Curaspan BUSINESS CASE
Leader in patient management for discharges
and referrals Manages patient referrals 4600+ health care facilities Connects providers, payers via web-‐based patient management platform Founded in 1999 in
Newton, Massachusetts
• Improve poor performance of Oracle solution
• Support more complexity including granular, role-‐based access control
• Satisfy complex Graph Search queries by discharge nurses and intake coordinators Find a skilled nursing facility within n miles of a given location, belonging to health care group XYZ, offering speech therapy and cardiac care, and optionally Italian language services
Curaspan SOLUTION• Met fast, real-‐time performance demands
• Supported queries span multiple hierarchies including provider and employee-‐permissions graphs
• Improved data model to handle adding more dimensions to the data such as insurance networks, service areas and care organizations
• Greatly simplified queries, simplifying multi-‐page SQL statements into one Neo4j function
FiftyThree BUSINESS CASE
Maker of Paper, one of the top apps
in Apple’s App Store, with millions of users
Based in New York City
• Add social capabilities to digital-‐paper app • Support social collaboration across millions of users in new Mix app
• Enable seamless interaction between social and content-‐asset networks
• Ensure new apps are robust, scalable and fast
FiftyThree SOLUTION
• Neo4j data model ideal for social network, content management and access control • Users create, publish and share designs simply • Easy to develop and evolve Neo4j-‐based app • Integrates well with FiftyThree EC2 architecture
See the Neo4j solution in action Betting the Company (Literally) on a Graph Databasehttp://aseemk.com/talks/neo4j-‐lessons-‐learned#/
App Store Editor’s Choice2012 iPad App of Year Apple Best Apps of 2014
Questions
• How does Neo4j fit into my existing infrastructure? As a Service.
• Will Neo4j scale? Yes.
top related