infrastructure for decision makers

of 46 /46
INFRASTRUCTURE FOR DECISION MAKERS Questions for a better architecture

Author: eric-lubow

Post on 04-Aug-2015




0 download

Embed Size (px)


1. INFRASTRUCTURE FOR DECISION MAKERS Questions for a better architecture 2. Eric Lubow @elubow #ddsea15 PERSONAL VANITY CTO of SimpleReach Co-Author of Practical Cassandra Skydiver, Mixed Martial Artist, Motorcyclist, Dog Dad (IG: @charliedognyc), NY Giants fan 3. Eric Lubow @elubow #ddsea15 SIMPLEREACH Identify the best content Use engagement metrics Stream processing ingest Many metrics, time sliced Multiple data stores 4. Eric Lubow @elubow #ddsea15 What do you mean infrastructure? 5. Eric Lubow @elubow #ddsea15 Architects CTOs Lead Developers Developers Basically everyone WHO IS MAKING THESE DECISIONS? 6. Eric Lubow @elubow #ddsea15 YOU WOULDNT BUILD SOFTWARE WITHOUT PLANNING FIRST, SO WHY WOULD YOU BUILD AN ARCHITECTURE WITHOUT PLANNING? 7. Eric Lubow @elubow #ddsea15 Architectures get built ad hoc Pieces tend to be built as needed and not always thought out Many lead developers dont have a lot of architecture experience We dont live in a perfect world and are usually time bound Product needs to be built and well figure out the rest later (technical debt) REALITY OF THE SITUATION 8. Eric Lubow @elubow #ddsea15 What are we actually going to talk about today? 9. Eric Lubow @elubow #ddsea15 Hardware Cloud Databases Message Systems Scale/Scaling Costs Compliance Development ease Authentication FRAMEWORK FOR BUILDING Developer / Operational Capabilities Available Support Monitoring / Instrumentation Testing / Staging / QA Repeatability of Systems Safety nets Pressure valves Administration ease Authorization 10. Eric Lubow @elubow #ddsea15 WHY SHOULDNT I LEAVE RIGHT NOW 11. Eric Lubow @elubow #ddsea15 Unsexy talks can have good information Understanding these concepts can save lots of technical debt There are lessons learned from not knowing which to ask questions Im kind of entertaining In case Im not entertaining, Ill use some entertaining pictures Im going to tell you a story REASONS TO LISTEN 12. Eric Lubow @elubow #ddsea15 HOW DID SIMPLEREACH GET FROM 13. Eric Lubow @elubow #ddsea15 TO Business/Application/Translation/Data Access Router/Load Balancer/Config/Authentication SERVICE SERVICE SERVICE SERVICE SERVICE SERVICE SERVICE SERVICE Redshift Platform 14. Eric Lubow @elubow #ddsea15 Allows people to use a common language when discussing or solving problems Allows a common toolset for solving problems Simplifies difficult tasks Every language has frameworks: Ruby/Rails, Python/Django, Javascript/Ember.js Attempts to answer the questions: How should I do this? Is this a good idea? Is this the right tool? WHY ARE FRAMEWORKS IMPORTANT 15. Eric Lubow @elubow #ddsea15 16. Eric Lubow @elubow #ddsea15 Where is this going to live? How do I get data in? How am I going to store the data? How do I move data around? BASIC QUESTIONS How should data look coming out? How do I get data out? How do I know if something is wrong? How do I maintain/scale/build? 17. Eric Lubow @elubow #ddsea15 Is this going on the cloud? Amazon, Google, Azure, Rackspace? Do you need to be in a data center? Are APIs important? What kind of distribution of services / fault tolerance needs to be available? What kind of SLAs do you need to meet (100% uptime)? WHERE IS THIS GOING TO LIVE? 18. Eric Lubow @elubow #ddsea15 HOW DO I GET DATA IN? Build apps that follow the same paradigm POST data to an end point Consume off a queue Use message systems for queueing Message aggregation for efficiency Message sampling for throttles Try to avoid talking directly to a database from client facing applications Write your own client driver to talk to your architecture 19. Eric Lubow @elubow #ddsea15 HOW AM I GOING TO STORE THE DATA? 20. Eric Lubow @elubow #ddsea15 Whats the latest cool technology? CHOOSING A DATABASE IS EASY, #AMIRITE What is my data volume? What are my query patterns? Is my data (un)structured? Will data remain consistent? Am I read heavy or write heavy? Am I batch loading data? Is eventually consistent data ok? Can I have a DR plan? Legal/compliance requirements? Are there experts/enterprise support? Whats the community like? Easy to administer? Tooling, monitoring, language support? Cloud or iron? High volume ingestion or batch loading? Fault tolerance? Open source vs enterprise system? Employee learning curve vs. learning cost? 21. Eric Lubow @elubow #ddsea15 HOW DO I MOVE DATA AROUND? ROAD METAPHOR: Messages = Cars Message System = Highway / Roads Database = Parking Lot Cache = Cell Phone Lot Commerce/Industry = Worker/ Consumer/Analyzer Enrichment = Gas Station 22. Eric Lubow @elubow #ddsea15 Only recently starting to become part of important discussions Provide consistent interfaces between disparate systems Clients can have minimal architecture knowledge Everyone can speak the same language (JSON, please not XML) Allow for high availability Help minimize the cost of downtime Control data flow patterns Makes [horizontal] scaling easier Enrichment/in-stream modifications of data Instrument and monitor data states between systems MESSAGE SYSTEMS ARE MY FAV 23. Eric Lubow @elubow #ddsea15 Distributed and de-centralized topology At least once delivery guaranteed Multi-cast style message routing Simple to configure and deploy All for zero-downtime maintenance windows Ephemeral channels for testing data Channel sampling NSQ 24. Eric Lubow @elubow #ddsea15 HOW SHOULD DATA LOOK COMING OUT? Agree on a data format? XML, JSON, AVROJSON Again, please dont use XML HATEOAS - heavy lift but decent client support What meta data should be sent with the response? How can unnecessary calls to an API be mitigated? 25. Eric Lubow @elubow #ddsea15 HOW DO I GET DATA OUT? Monolithic service architecture REST interface through a single URL to ask for data? Many micro-service end points? HTTP / RPC / THRIFT JSON API / HATEOS / CUSTOM How many libraries need to be written, tested and maintained? 26. Eric Lubow @elubow #ddsea15 And now back to our story 27. Eric Lubow @elubow #ddsea15 SIMPLEREACH CONTEXT 100 million URLs 300 million Tweets 50k - 100k events per second (tens of billions of events per day) 200G new per hour 700T of total data (10T per month) 10T of hot data 2-3T of daily log data Excludes all monitoring data 28. Eric Lubow @elubow #ddsea15 Solr Solr Vertica + Cassandra Vertica + Cassandra Vertica Mongo 29. Eric Lubow @elubow #ddsea15 STREAM-BASED DATA COLLECTION Internet Edge InternalAPI Solr C* Mongo Redis Vertica API Fire Hose App Consumers Queue 30. Eric Lubow @elubow #ddsea15 NEED FOR SPEED Concurrency Compiled code is much faster Statically typed languages make for less unexpected error situations Still speaks every other interchange language Cleaner code 31. Eric Lubow @elubow #ddsea15 MICROSERVICES: THE NEW HOTNESS! 32. Fine grained, clearly scoped services Break 1 thing != break #allthethings Better fault isolation Easier to create throttles/release valves Better able to monitor more granularly Made everyone more devopsy MICROSERVICES: THE NEW HOTNESS? Strict micro-service setups have large database overheads Testing/deployments are more complex More general overhead Slow down developer time Service discovery Pros Cons 33. Eric Lubow @elubow #ddsea15 HYBRID MICRO-SERVICE / SHARED LIBRARY Business/Application/Translation/Data Access Router/Load Balancer/Config/Authentication SERVICE SERVICE SERVICE SERVICE SERVICE SERVICE SERVICE SERVICE Redshift Platform 34. Eric Lubow @elubow #ddsea15 GENERIC SERVICE AND DATA FLOW Redshift Data Access Layer Business Logic NSQApplication Layer NSQ router auth Data Access Layer Business Logic NSQApplication Layer NSQ Data Access Layer Business Logic NSQApplication Layer NSQ Data Access Layer Business Logic NSQApplication Layer NSQ logstash 35. Eric Lubow @elubow #ddsea15 SMART ROUTER Handles service state and service registry/discovery information Canonical reference for all things platform Prevents older versions of services from re-appearing Highly available proxy application Has burst-able capacity to mitigate DoS Auto-scaling tier 36. Eric Lubow @elubow #ddsea15 BUSINESS LOGIC LAYER Contains thicker macro services Aggregates common features and functionality Permissioning/throttling/access restrictions Centrally handling trigger events Exposing various API end points Orchestrating calls to the DAL 37. Eric Lubow @elubow #ddsea15 DATA ACCESS LAYER Responsible for CRUD Houses many of the data models Responsible for balancing throughput of data in/out of databases Minimize the number of DB connections by using pooling 38. Eric Lubow @elubow #ddsea15 HYBRID MICRO-SERVICE / SHARED LIBRARY Redshift Platform WebApp 1 WebApp 2 Python App Go App Ingestion Stream Proxy/ Router Ingestion Stream Ingestion Stream 39. Eric Lubow @elubow #ddsea15 SMILEY HAPPY PEOPLE 40. Eric Lubow @elubow #ddsea15 HOW DO I KNOW IF SOMETHING IS WRONG Testing Monitoring Instrumentation No pull requests w/o instrumentation No pull requests w/o monitoring Build dashboards 41. Eric Lubow @elubow #ddsea15 DASHBOARD #ALLTHETHINGS 42. Eric Lubow @elubow #ddsea15 WHAT SHOULD I MONITOR/INSTRUMENT? Frequency Error rates Success rates Request Volume Message Counts 43. Eric Lubow @elubow #ddsea15 HOW DO I MAINTAIN/SCALE/BUILD? Already discussed monitoring/instrumentation Making sure you can maintain architecture is the same as ensuring you can maintain code Have easy to use, flexible deployment systems Keep an audit trail Make processes repeatable and systematic Configuration management Automation (event based when possible) Easy enough to add and maintain but difficult to break 44. Eric Lubow @elubow #ddsea15 If you want to increase innovation, you need to lower the cost of failure. Joi Ito, MIT Media Lab 45. Eric Lubow @elubow #ddsea15 WHAT JUST HAPPENED A little architecture knowledge is a good thing Dont start out with complexity Build what you need with growth in mind Make sure you have the basics covered Might be something to the micro-service hype Monitor everything Allow customizations and innovations 46. Eric Lubow @elubow #ddsea15 QUESTIONS IN LIFE ARE GUARANTEED, ANSWERS ARENT. Eric Lubow @elubow Data Day Seattle #ddsea15