keynote for cse conference 2011: distributed systems: what? why? and bit of how?
DESCRIPTION
Did this as the keynote for CSE conference 2011, University of MoratuwaTRANSCRIPT
![Page 1: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/1.jpg)
Distributed Systems: What?
Why? And bit of How?
Srinath Perera
![Page 2: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/2.jpg)
I cannot cover Distributed Systems in 30 minutes!
But, I can tell why you might want to learn Distributed
Systems in 30 minutes!
http://www.flickr.com/photos/uwehermann/82753155/sizes/m/in/photostream/ and http://www.flickr.com/photos/peterpearson/5921765552, licensed under CC
![Page 3: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/3.jpg)
What is a Distributed System?
"A distributed system is one on which I cannot get
any work done because some machine I have
never heard of has crashed.“
--Leslie Lamport
![Page 4: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/4.jpg)
What is a Distributed System?
“A system in which hardware or software components located at networked computers communicate and coordinate their actions only by message passing.” - [Coulouris]
“A distributed system is a collection of independent computers that appear to the users of the system as a single coherent system.” - [Tanenbaum]
![Page 5: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/5.jpg)
Characteristics and Challenges• Fault
Tolerance• Scale• Transparency
• No Global Clock• Communication
only by message Passing
• No Global State• Independent
Failures
Photo by John Trainoron Flickr http://www.flickr.com/photos/trainor/2902023575/, Licensed under CC
![Page 6: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/6.jpg)
Fallacies of Distributed Systems
• The network is reliable.• Latency is zero.• Bandwidth is infinite.• The network is secure.• Topology doesn't change.
• There is one administrator.
• Transport cost is zero.• The network is
homogeneous.http://www.flickr.com/photos/12587661@N06/2300406685, @Michael Gwyther-Jones, Licenced CC
![Page 7: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/7.jpg)
Why Distributed Systems
• Need to build bigger systems• Many usecases are inherently distributed • To avoid failures • Omnipresence
– if you buy food from a super market – If you buy a book from a Bookshop Chain– If you search in the Web– If you use a GPS navigator – If you turn on your My 10 list– If you pay a bill– If you use your mobile App
![Page 8: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/8.jpg)
A System Usecase Classification
• Processing Data (Moving vs. Stored Data)
• Servers: Receive, Process, and Respond
• Running User provided Jobs
• Data Storages and Provenance
http://www.flickr.com/photos/kelsea-groves/5535666329/
![Page 9: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/9.jpg)
Usecase: Processing Data: React to Sensors
• Many sensors: Weather, Travel, Traffic, Surveillance, Stock exchange, Smart Grid, Production line
• Monitor, understand, and react to events• Usually handled with CEP (e.g. Esper, Stream Base, Siddhi) or Stream
Processing (S4, Twitter Stream)http://www.flickr.com/photos/imuttoo/4257813689/ by Ian Muttoo, http://www.flickr.com/photos/eastcapital/4554220770/,
http://www.flickr.com/photos/patdavid/4619331472/ by Pat David copyright CC
![Page 10: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/10.jpg)
Usecase: Processing Data: Target Marketing
• Receive data about users continuously: e.g. web clicks, what they brought, what they liked and do not like, what their friends like and brought
• Build models, index information in the background • Send him advertisements that best matches his preferences
– have to do this quickly– in few (say 50) milliseconds
• Cloud be the next billion dollar problem
![Page 11: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/11.jpg)
Usecase: Receive, Process, and Respond: Online Store (e.g. Amazon)
• Many Sellers selling many items and Many Byers
• List of all items, with their specs
• Index items by many dimensions and support search
• Support checkout, track the delivery, returns, ratings, and complains
• Supported by partitioning sellers/ items across many nodes
![Page 12: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/12.jpg)
Usecase: Running User Provided Jobs : SETI@Home
• Many people volunteer their computing power
• Scientists submit computing jobs to the system
• Broker and match resources with jobs, run them and return results. Handle failures. Avoid free riding.
• Considered biggest computer in earth (505 TFLOPS, 150k active computers)
Finding EThttp://www.elfwood.com/~axthony/Staring-Aliens.2552052.html, Licensed CC
![Page 13: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/13.jpg)
Usecase: Data Storages and Provenance (Sky Server)
• Telescopes (Square Kilometer Array) keep collecting data from the sky (Tera bytes per day)
• Sky Server let scientists to come and see the sky of a given location, as seen at a given time.
• Moving data takes long time. 1TB takes– 100 Mbps network : 30 hrs– 1 Gbps network : 3 hrs– 10 Gbps network : 20 minutes
• Given a data item, need to track how it is created, equipment accuracy, transformations used etc.
http://www.fotopedia.com/items/flickr-518876976 and http://www.geograph.org.uk/photo/103069, Licensed CC
![Page 14: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/14.jpg)
Mobile Sensor Crowdsourcing• Mobile phones are now like a
weather center: has – a barometer– temperature sensor– proximity sensor– GPS– moisture sensor
• Get volunteer phones to send sensor data (Crowd source).– report on weather– crop diseases (agriculture
officials)– epidemics (from hospitals,
doctors) • Use that to do weather
predications, crop disease and epidemic spread
• Moving Sensors (Polar Grid)
http://www.fotopedia.com/items/flickr-2548697541 , http://www.geograph.org.uk/photo/1534209, and http
://www.yourbdnews.com/2011/10/17/samsung-files-to-halt-iphone-4s-in-japan-australia/iphone-
4s, Licensed CC
Mobile: Solving the Last Mile Problem
![Page 15: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/15.jpg)
Great! lets see what Distributed System
technologies have made these use cases possible!!
![Page 16: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/16.jpg)
Distributed Systems Timeline/HistoryPeriod Topics
1965-late 70s Parallel Programming, Self Stabilization, Fault Tolerance, ER Model/ Transactions, Time Clock
1980s Consensus and impossibility, SQL, Distributed Snapshots, Replications, Group Communication
Early 90s Linearizability, Parallel DB, transactional Memory, RAID, MPI
Late 90s Volunteer Computing, P2P file sharing, Complex event processing
Early 2000 Oceanostore, Web Services, Symantec Web, REST, DHT, Pub/Sub, Grid, Autonomic Computing, Google File System, Virtualization, SOA, Map reduce
2005-2010 Cloud, NoSQL, Mobile Apps, Data Provenance
![Page 17: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/17.jpg)
Theoretical Computer Science• Concerns with – Coordination algorithms: Leader
Election, multi-cast, distributed locks, barriers, snapshot algorithms
– Impossibility results, upper and lower bounds
– Distributed versions of some centralized algorithms (e.g. shortest path)
– Lot of work done on 70s, and layed the ground work for Distributed Systemshttp://www.flickr.com/photos/lodz_na_nowo/5690492370/
http://xkcd.com/384/ http://www.flickr.com/photos/quinnanya/4990131194/sizes/z/in/photostream/
, Licensed CC
![Page 18: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/18.jpg)
Communication Protocols
• Request/Response– RMI, CORBA, REST/HTTP,
WS, Thrift• Publish/Subscribe• Distributed Queues• DHT (Distributed Hash
Tables)• Gossip/ Epidemic
Protocols • Whiteboards
http://www.flickr.com/photos/novecentino/2596898279/, Licensed CC
Message
![Page 19: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/19.jpg)
Request/Response and Architectural Styles
• Message formats • RMI, CORBA, REST/HTTP, Web Service, Thrift
• Architectural Styles – Remote Procedure Calls (RPC)– Distributed Objects – Service Oriented Architecture (SOA)– Resource Oriented Architecture (ROA)
![Page 20: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/20.jpg)
Known Distributed Architecture Patterns
• LB + Shared nothing Nodes • LB + Stateless Nodes + Scalable
Storage • DHT (Distributed Hash Table)• Distributed queues • Publish/Subscribe Broker Network • Gossip architectures + biology
inspired algorithms• Map reduce/ data flows • Stream processing • Tree of responsibility
![Page 21: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/21.jpg)
LB + Shared Nothing and 3-Tier
• Most common scaling pattern• Most architectures follows this model
![Page 22: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/22.jpg)
Storages• Single Database • Replicated Databases • Parallel Databases (Sharding)• NewSQL (In-Memory,
sharding .. Highly optimized)• NoSQL (Column Family, Key
Value pair, Document)
OracleOf Delphi
![Page 23: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/23.jpg)
Building Scalable Systems• Single Machine• Shared Memory
Model • Clustering (State
Replication through group communication)
• Shard Nothing • Loose Consistency
with Shared nothing
http://www.fotopedia.com/items/louromig-8P4w6xtSgbY, Licensed CC
![Page 24: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/24.jpg)
Publish Subscribe and EDA
• Many publishers send events• Subscribers register events, and a publish/subscribe
network match and redirect events • Have scalable implementations • Basis for event driven architectures
![Page 25: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/25.jpg)
Cloud Computing• Ability to buy computations power,
storage, or execution services as an Utility, on demand.
• Best way to explain it is by comparing it to Electricity
• Idea is a big pool of servers and share. • Economics of scale through Optimize
large scale operations.• Resource Pooling. • No need for capacity planning, start
small and grow as needed. • Outsource and enabling specialization.
photo by LoopZilla on Flickr, http://www.flickr.com/photos/loopzilla/2328231843/sizes/m/in/photostre
am/, Licensed under CC
![Page 26: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/26.jpg)
Where do go from here?
![Page 27: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/27.jpg)
If You Plan to Learn about Distributed Systems
• One of the fields to learn by doing • You have to be a good
programmer– a patient one (Debugging)– Lazy one (but intelligent)
• Start by writing some Web Services, request response stuff
• Stop reinventing the wheel, start using tools (middleware)
• Learn Zookeeper • Take a class – read, write code,
debug, .. http://www.flickr.com/photos/mariachily/5250487136, Licensed CC
![Page 28: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/28.jpg)
Distributed System Community• Based around ACM, IEEE, and USENIX• Well known journals
– IBM System journal, ACM Operating Systems Review, ACM Transactions on Computer Systems, IEEE Distributed Systems Online, IEEE Transactions on Parallel and Distributed Systems
• Conferences– Theory: ICDCS, SPDC– SOA/Cloud : ICWS – E-Science, Parallel Programming : HPDC, SC, E-
Science, Ccgrid– Systems : USENIX, Middleware, ACM Symposium on
Operating Systems Principles, FAST, LISA, OSDI– DB : Sigmoid record, VLDB
• Awards – Turing Award – Edsger W. Dijkstra Prize in Distributed Computing
http://www.flickr.com/photos/dullhunk/4187914071, http://www.fotopedia.com/items/flickr-1544709148, Licensed CC
![Page 29: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/29.jpg)
Few Must Read Papers• System Structure for Software Fault Tolerance (1975)• Reaching Agreement in the Presence of Faults (1980)• Time, Clocks, and the Ordering of Events in a Distributed System (1978)• Reaching agreement in the presence of faults(1980) and The Byzantine
generals problem” (1982), • End-to-End Arguments in System Design (1984)• A Note on Distributed Computing (1994)• Scale in Distributed Systems, (1994)• The Google File System (2003)• Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications,
(2001)• The Google file system (2003)• Xen and the Art of virtualization (2003)• MapReduce: Simplified Data Processing on Large Clusters (2004)
![Page 30: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/30.jpg)
Some Open Challenges
• Every thing Data: Analytics, AI, Data Mining (Distributed versions of many algorithms)
• Complex Event Processing (CEP)• How to Scale?• Middleware for the Cloud• Scalable Storage • Provenance• Workflows• Guard against DDoS and other
Distributed Security Issueshttp://www.flickr.com/photos/brianscott/5474210001,
Licensed CC
![Page 31: Keynote for CSE conference 2011: Distributed Systems: What? Why? And bit of How?](https://reader033.vdocuments.net/reader033/viewer/2022061304/5495fd7bb479594c4d8b4e5d/html5/thumbnails/31.jpg)
Questions?
Copyright by romainguy, and licensed for reuse under CC License http://www.flickr.com/photos/romainguy/249370084