schema agnostic indexing with azure...

Post on 26-Jul-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Schema Agnostic Indexing with Azure DocumentDBDEEKSHA SINGH: 2641679

YASH THAKKAR: 2642764

ABSTRACT

Azure DocumentDB is Microsoft’s multi-tenant distributed database service for managing JSON documents at Internet scale.

Automatic indexing of documents without requiring a schema or secondary indices.

Operates within extremely frugal resource budget .

OUTLINE

DocumentDB

DocumentDB Capabilities

Resource Model

System Topology

Design Goals

Schema Agnostic Indexing

Logical Index Organization

When to not to use and when to use DocumentDB

INTRODUCTION

DocumentDB is based on the JSON data model and JavaScript language directly within its database engine.

The indexing subsystem needs to support:

Automatic indexing of documents

DocumentDB’s query language

Real time, consistent queries

Multi-tenancy under extremely frugal resource budgets

Predictable Performance guarantees

DOCUMENTDB CAPABILITIES

DocumentDB query language supports rich relational & hierarchical queries.

By default, the database engine automatically indexes all documents without requiring schema or secondary indexes from developers.

Transactional execution of application logic.

DocumentDB offers well defined consistency levels for developers.

All machine and resource management is abstracted from users.

RESOURCE MODEL

A tenant of DocumentDB starts by provisioning a database account.

A DocumentDB database manages a set of entities: users, permissions and collections-referred to as resources.

Collection is a schema–agnostic container of arbitrary user generated documents.

Developers can interact with resources.

Tenants can elastically scale a resource by simply creating new resources which get placed across resource partition.

SYSTEM TOPOLOGY

Deployed worldwide across multiple Azure regions.

Managed and deployed on clusters of machines, each with dedicated local SSDs(to provide durability and high availability).

DocumentDB database engine consist of following components:

RSM for coordination

JavaScript language runtime

Query processor

Storage and indexing subsystems

DESIGN GOALS FOR INDEXING

Automatic Indexing

Configurable storage/performance tradeoffs

Efficient, rich hierarchical and relational queries

Consistent queries in face of sustained volume of document

Multi-tenancy

SCHEMA AGNOSTIC INDEXING

No Schema, No Problem!

Documents as Trees

Index as a Document

DocumentDB Queries

No assumptions about the documents and allows documents to vary in schema.

To blur the boundary between the schema of JSON documents and their instance values

• Every path in document tree is indexed.

• Each update of a document leads to update of the structure of index.

• Developers can query DocumentDB collections using queries written in SQL and JavaScript.

• DocumentDB Query IL

QUERY IL

Designed to exploit JSON and JavaScript integration

Rooted in JavaScript type system

Follows JavaScript language semantics for expression evaluation & function invocation

Designed to be target o translation from multiple query language frontends

LOGICAL INDEX ORGANIZATION

The index is the union of all documents and is also represented as a tree.

Each node of the index tree contains a list of document ids corresponding to the documents containing the given label.

WHEN NOT TO USE DOCUMENTDB

Consider Azure DocumentDB

When you need:

To build a new web and mobile cloud-based applications

Rapid development and high-scalability requirements

Query and processing of user and device generated data

To run a document store in virtual machines

A managed service model

REFERENCES

AzureDocumentDB Documentation: http://azure.Microsoft.com

Javascript Object Notation: http://ietf.org

Google Cloud Datastore: http://cloud.google.com/datastore/

QUESTIONS?

top related