building a scalable platform for sharing 500 million photos

PowerPoint Presentation

Building a Scalable Platform for Sharing 500 Million PhotosWouter Crooy & Ruben HeusinkveldSolution Architect & Technical Lead, Albumprinter

Ruben1

Wouter CrooySolution Architect, Albumprinter @wcrooy

Ruben2

Ruben HeusinkveldTechnical Lead, Albumprinter @rheusinkveld

Ruben3

Who are weWouter Crooy Solution ArchitectRuben Heusinkveld Technical LeadNeo4j Certified Professionals

Ruben

At Albelli we want to inspire people to relive and share lifes moments by easily creating beautiful personalized photo products.Vision: To brighten up the world by bringing peoples moments to life.

Albumprinter is a Cimpress company. The most known brand here in the US for Cimpress is Vistaprint. Im sure youve all know it.Albumprinter is based in Amsterdam, The Netherlands.We have multiple consumer brands to serve the European marketAlbumprinter aquired FotoKnudsen in June 2014

4

The photo organizerDeliver well organized, easy to use and secure storage for all your imagesEase the process of selecting photos for creating photo productsStarted as part of a R&D Skunk works project

Ruben

Goal: Deliver well organized, easy to use and secure storage for all your imagesBuild by team of 5 (1 designer, 1 frontend developer, 1 quality engineer, and Wouter and myself focusing on the backend)

5

The photo organizer

RubenLaunched June of this yearAvailable on all devices6

The photo organizer

RubenPhotos are automatically grouped together into events7

The photo organizer

Ruben:Easy to share photos with friends or publicly if you wantPrivately via invites8

The photo organizer from photos to products

Ruben:The photos can be used to create any product like a photo book, calendar or wall decor9

The photo organizer demo

https://minnebanken.no

Ruben:The photos can be used to create any product like a photo book, calendar or wall decor10

The challenge

Wouter11

The challengeReplace legacy system with the new photo organizerMove 1.3 PB of photos from on premise to cloud storageAnalyze & organize all photos (511 million)Data cleansing while importingUsing the same technology / architecture during import and afterAbility to add features while importingCore of the systems are built in .NET

WouterNot uploading duplicates12

The importHard deadlineFactory closing that holds the data center with all photosStarted 1st of AprilMinimum processing of 150 images / second~500 queries / second to Neo4jUp to 700 EC2 instances on AWS

Wouter13

How we did itMicro servicesCommand Query Responsibility Segregation (CQRS)ClusterMultiple write nodesSingle master read only nodesHAProxyCypher only via REST interface.NET Neo4jClient

Wouter14

Architecture

WouterIn Neo4j we only store the metadata. The actual photos are stored in Amazon Simple Storage Service (S3).15

Why we choose Neo4jClose to domain modelNot an ordinary (relational) databaseLooking for relations between photos/usersScalableFlexible schema Natural / fluent queriesACID / data consistency

Wouter16

The design

Ruben17

Graph model

Ruben18

Graph model

Ruben19

Graph model

Ruben

20

Our Neo4j databaseMore than 1 billion nodes4.1 billion properties2.6 billion relationsTotal store size of 863 GB

RubenFor all those photos this resulted in:More than 1 billion nodes4.1 billion properties2.6 billion relationsTotal store size of 863 GB

21

Command Query Responsibility SegregationSeperation between writing and reading dataDifferent model between Query and Command APIIndependent scaling

RubenI know its really ambitious to explain CQRS within 2 slides. But I would still like to explain why and how it could work with Neo4j.

Events sourcing. Double update to db and cache. In our case we used a cache update/flush on certain rules. Pro: Less work, database is to large for cache.Con: Not always reliable cache source. 22

Bumps and Solutions

Wouter23

CQRS Seperate Reads & WritesNo active event publishing in placeSpecific scenarios for updating / writing dataAbility to create seperate model for read and writeUpdates (pieces) the user graphRequires reliable and consistent readScale out -> overloading locking of (user) graphAfter importLow performance scenarios -> cache with lower update priority

WouterNeo4j in its core is very capable of handling CQRS interfaces. Since youre not updating a table but (parts) of the graph. Due to its ACID nature is should also be able to make sure there are no race-conditions. But since this archicture allows to massively scale out that does not always match the capebilities of a ACID DB. Especially in the cases where the writes are more occuring then the reads.

Make sure the read is consistentIn our situation, CQRS is extra complex since we have a ordered crawler (5+ steps) which also does the writes. But the crawler(s) and query api are still allowed to do reads.

https://www.infoq.com/news/2015/05/cqrs-advantageshttp://udidahan.com/2011/04/22/when-to-avoid-cqrs/http://udidahan.com/2009/12/09/clarified-cqrs/http://udidahan.com/2010/08/31/race-conditions-dont-exist/

See also consistent read solution. In cases were we dont need to have consistsent read we can use the case. 24

Read after write consistencyAll reads should contain the very latest and most accurate dataReplication delay between serversSplit on consistency

Article by Aseem Kishore:https://neo4j.com/blog/advanced-neo4j-fiftythree-reading-writing-scaling/

WouterRead fastly outnumber writes in our application as for many applications.Split on consistency, not read vs. writeTrack user last write time for read after write consistencyMonitor and tune slave lag, via push/pull configsStick slaves by user for read after read consistencyhttps://neo4j.com/blog/advanced-neo4j-fiftythree-reading-writing-scaling/

Credits to Aseem Kishore and his team at FiftyThree for sharing this on the conference last year.

25

Graph lockingConcurrency challengeScale-out => more images from the same userManage the inputHigh spread of user/image combinationPrevent concurrent analysis of multiple images from the same user:GET /db/manage/server/jmx/domain/org.neo4j/instance%3Dkernel%230%2Cname%3DLocking

Wouter

Mainly during the importing of photos

{"description" : "org.neo4j.kernel.info.LockInfo","type" : "org.neo4j.kernel.info.LockInfo","value" : [{"name" : "description","description" : "description","value" : "ExclusiveLock[\nClient[1] waits for []]"}, {"name" : "resourceId","description" : "resourceId","value" : "2612184871"}, {"name" : "resourceType","description" : "resourceType","value" : "RELATIONSHIP"}]}26

Batch insert vs single insertCypher CSV import per 1000 recordsPrevent locking caused by concurrency issues

Wouter27

No infinite scale outFind the sweet spot for the amount of cluster nodes+1 nodes => more replications updates => higher load on write master

Wouter28

TimelineWere looking for photos which should belong to each other based on date-taken. Moving from full property scan to graph walking via the timeline. For large collection 75% less DB-hitsWalking the timeline if looking for photos within a certain timeframeLess photos to evaluate for property scan (SecondsSinceEpoch)Works perfectly for year, month, day selections

Wouter29

.NET & Rest interfaceCustom headers to REST Cypher endpoint (Filtered by HaProxy)To route to multiple write serversSticky session per userCustom additions to .NET Neo4jclientManaging JSON resultset

Wouter30

Graph design considerationsProperty scan(User) full-graph-scanDifferentiating propertyCreate nodeNo path/clustered indexes. (yet.. )

Making changes to the schema. For 550+ million nodes

Wouter31

Graph design improvementsProperty searchmatch (u:User { Id: 001"}) 2812 db hits

Node/Relationship searchmatch (u:User { Id: "001"})-[:HasFavourites]-(f:Favourites) 13 db hitsdbms.logs.query.* (dont forget to enable parameters resolving)Our alternative: Integrate with Kibana / Elasticsearchhttps://neo4j.com/docs/operations-manual/current/reference/

WouterDB hits increase when the number of photos increases if you do the property search32

The future

Wouter?33

The futureNeo4j 3.xBoltDataminingProcedures / APOC

Wouter?34

Thats a wrapWouter?35

building a scalable platform for sharing 500 million photos

Technology