reconceiving the web as a distributed (nosql) data system
DESCRIPTION
[Slides from NoSQL Now! 2013] Nearly every Web request is a request for information from a database or a front-end caching system for one. Based on this concept, we can reconceive the Web as a large-scale distributed data system using NoSQL query languages across high-level protocols such as HTTP. Exploring this idea further leads us to a better understanding of the structure of the Web, and invites us to apply modern NoSQL thinking toward making it better. My goal is to re-orient people’s thinking toward the Web as a big NoSQL data system and then explore the implications.TRANSCRIPT
![Page 1: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/1.jpg)
Reconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data System
Daniel AustinPayPal, Inc.NoSQL Now! ConferenceAugust 22, 2013V1.2
![Page 2: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/2.jpg)
The Big Idea
“The World-Wide Web is the World’s Largest NoSQL
Distributed Data System”
![Page 3: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/3.jpg)
The Mind Map
![Page 4: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/4.jpg)
History
• DNS (1983)The first large-scale DDS, using Flat files• WWW (1989)“a single user-interface to many large classes of stored information such as reports, notes, data-bases, computer documentation and on-line systems help”
Berners-Lee & Cailliau, 1989
But Why NoSQL?
![Page 5: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/5.jpg)
WWWDB: Anatomy
WWW
HTML(Presentation)
URI(Addressing)
HTTP(Transport)
![Page 6: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/6.jpg)
Typology of Hyperlink Queries• Hypertext links come in two flavors:
transitive and intransitive• Transitive queries are usually for
inactive content – presentation material to supplement the user’s queried data
• Intransitive queries are user-actuated and usually provide navigation and business logic for the query
![Page 7: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/7.jpg)
Data Clients Query Data Sources
![Page 8: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/8.jpg)
What Do HTTP URIs Identify?• Not a single resource• WWWDB query syntax is split
between HTTP ‘verbs’ (POST, GET, PUT, DELETE) and their objects, addressed by URIs
• URI encapsulates a resource as the object identified by a query
(Note that transitive and intransitive hyperlinks almost always go to different locations)
![Page 9: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/9.jpg)
CDN as a Caching Mechanism• CDNs such as Akamai and
Cloudfront provide local caching services for WWWDB, mostly for static, presentation-related objects– Frequency-based caching for transitive
hyperlinks– Most secondary queries go to the CDN– 95%+ of all the bytes transported over
the Web– ~90% of all WWWDB queries (HTTP
requests/responses)
![Page 10: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/10.jpg)
APIs as Secondary Queries• Active Subqueries• Usually dynamic• URIs function as a selection mechanism• Often User-Actuated, Intransitive Events• Query results often modify the display
![Page 11: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/11.jpg)
REST as a Query Syntax Mechanism• Common
Semantics– REST provides a
means of specifying the proper query for an object in a specific state
• Demands NoSQL due to state constraints
• Uses query strings for ranged searches
Image courtesy IBM
![Page 12: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/12.jpg)
Indexing WWWDB
• Google, Bing, Yahoo! and other ‘index searches’ on WWWDB– Inconsistent results are accepted
• Query Cache or a Data Cache?• Secondary Query Routing• Alternative query indices – Wolfram
Alpha, Index Mundi, Twitter act as ‘almanacs’
![Page 13: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/13.jpg)
Does the CAP Theorem Apply?
Yes, It Does, But Only Partially• Partition and Availability – 404’s,
DDOS• WWWDB Relaxes the Consistency
Constraint• We accept inconsistent queries and
broken links as a tradeoff for real-time availability and high-velocity updates
But We Can Do Better!
![Page 14: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/14.jpg)
Drawbacks of the CAP Model• Caching – All data is Not cached
everywhere– Some sites are single-location/single
source– Hard (static) assets are far more
widely cached• What does CAP mean when data is
only partially distributed?– Very little – consistency only applies to
part of the queries
![Page 15: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/15.jpg)
Improving WWWDB
• Better Data Clients– HTML5 provides new query
mechanism via Web Sockets, WebStorage, and other means
– Still mostly presentation-level improvments
• Better Caching, Distribution & Tranport– Work currently being done at IETF on
HTTP 2.0• Better Queries
– Very little work being done – more on this later!
![Page 16: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/16.jpg)
RDF and the Semantic Web• Changes query patterns but not
storage– Queries based on semantic ID of
resource• Requires content to be semantically
labeled• Work on Sparql reduces query
limitations– But may also make things slower (!)
• Cloud computing and query distribution will prove a more powerful force for improving WWWDB than semantic queries
![Page 17: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/17.jpg)
Browsers as Data Clients
• Presentation First!– Data is treated as secondary
• Designed for Browsing Not Querying– Query patterns are inefficient– Semi-stateful nature of Web sessions
• Bedeviled with Legacy Issues
![Page 18: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/18.jpg)
Optimizing Web Queries
• REST doesn’t imply FAST – Use a domain model to limit query
endpoints– May require unnecessary requests
• Query-string semantics allows for joins, arbitrary comparison
• Recognize that some queries require state and use it
• Distribute intransitive queries more widely
![Page 19: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/19.jpg)
Reforming Hypertext for Querying WWWDB• Enlarge the number of link types• Distinguish transitive links• Add bidirectional linking• Enhance the semantics of the query
string• Make hypertext more useful for
mobile and devices
![Page 20: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/20.jpg)
IPv6 and Query Routing for WWWDB• The IPv6 space is large enough to
allow for multiple query addressing schemes:– Semantic addressing of objects by
type– Objects in the Internet of Things– Dynamic, context driven addressing
![Page 21: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/21.jpg)
Scaling the WWWDB
• This may require expanding our notions of URIs and links (queries)
• Semantic mapping of resources requires additional complexity for queries
• Explicit state management for efficiency
Every system has a scaling limit
![Page 22: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/22.jpg)
Final Thoughts• The Web is the largest NoSQL
Distributed Data System– URIs address the resultset of a NoSQL
query– Transitive and Intransitive hyperlinks
• We can add power and simplicity to our queries by carefully reforming the URI syntax and the current implementations of hypertext
• HTTP and HTML are undergoing significant evolution – now it’s time for URIs!
![Page 23: Reconceiving the Web as a Distributed (NoSQL) Data System](https://reader035.vdocuments.net/reader035/viewer/2022062513/554be432b4c90556328b494e/html5/thumbnails/23.jpg)
Reconceiving the Web as a Distributed Data System
Thank You!
Reconceiving the Web as a Distributed Data System
Thank You!
Daniel AustinPayPal, Inc.NoSQL Now! ConferenceAugust 22, 2013V1.2
@daniel_b_austin