streamcentral information system overview

25
© Virtus IT Ltd 2014 - Confidential A trusted partner Business Powered By Data Overview of the StreamCentral Information System Raheel Retiwalla [email protected]

Upload: raheel-retiwalla

Post on 12-Jul-2015

150 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

A trusted partner

Business Powered By Data

Overview of the StreamCentral Information System

Raheel [email protected]

Page 2: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

Goals of the StreamCentral Information System

• Build high impact business solutions from 0 – finish in days or weeks

• Move traditional batch based Business Intelligence to real-time Business Intelligence

• Account for streaming data and the need to understand and analyze streaming data in real-time

• Easily connect to the cloud

• Account of traditional static data

• Include events as a core part of real-time analytics

• Require little or no knowledge of Business Intelligence design patterns, practices or terminology

• Introduce design automation that continually infers relationships in data to add additional value to collected data

• Take advantage of Big Data data stores like NoSQL, MPP and Hadoop with little or no knowledge of the underlying technology

Page 3: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

Collect Stream Data Collect static data

Stream event processing Data transformation

KPI & Conditions Correlation

Connect data to business entities Standardize time & location

Auto build data marts that are updated in real-time

Auto build data marts that are updated periodically

• One product that already comes built in with integrated components that work seamlessly together

• High Automation• No coding required

• Focus on the solution and not on integration• Can be used by data scientists and

developers• Does not require extensive specialist skill

Building a Big Data Business Solution WITH StreamCentral

Time

Latest innovationin Big Data technologies

Hadoop

MPP NoSQL Caching

Page 4: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

StreamCentral

BI Server – Run with Scale

Information Warehouse Manager – Auto Build

Workbench – Easy to Design

Rapidly industrialize the use of data by designing, building and running real-time business intelligence and Big Data solutions with StreamCentral

Functional Application

Event Driven Predictive Analytics

Industry Application

BI / ReportingData Exploration /

Visualization

Analytic Applications

Association Analysis

Business Event Detection

Data Publishing: SQL Server,

Vertica, MongoDBData ExportData Collection Data Processing Caching

Security Designer Systems Management

Solution Designer (Data Consumption, Data

Transformations, Conditions, Event, Correlation)

API Designer

De-normalized schema generation for data marts

Security schema generationMeta Data ManagerNormalized schema

generation for Fact and Dimensions

Auto generate database design, auto generate database and application code, infer relationships in data

StreamCentral - Big Data Plumbing

Focus of this presentation

Page 5: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

The StreamCentral Information WarehouseA dynamic warehouse automatically built and continually managed by StreamCentral

Page 6: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

The SC Information Warehouse

• Facts – Two types of fact tables• Regular Fact Tables

• Fact tables integrate the dimension key internally automatically

• Has it’s own surrogate key

• Security Factless Fact tables

• Dimensions• Custom Dimension

• Entities

• Systems Dimensions

• Environmental Facts• Treated as facts and dimensions

Page 7: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

The StreamCentral FactDefinition: A fact is a measurable value which represents an actual fact from a specific business process or a system. Examples include sales data, video performance, contract data, health record, reading from a sensor

Page 8: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

Facts Overview

• Facts are created as data sources in StreamCentral• One time load

• Periodic load – assign frequency

• Pull - Supports Oracle, SQL Server, Flat Files, Excel, MySQL

• Push – API available to push data into a StreamCentral data source in real-time (JSON and XML) – Useful to have real-time “facts” being pushed directly to StreamCentral

• Dimension surrogate keys get appended to fact in real-time in parallel to the dimension update process. Key Feature: Allows system to be updated and available in real-time

Page 9: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

Security Fact Table

• User can specify security rule for data access at the attribute level for a given security role using the Workbench

• StreamCentral automatically creates a Factless Fact Table – Key Feature

• The Fact table records the role and the sequence of a record in a data source

• As soon as a new record is entered into a source, the security sequence along with roles that have access to the sequence is maintained in this table

Page 10: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

The StreamCentral Dimension

Page 11: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

Definitions

• Dimension – A dimension is a category of information that can be used to further understand a fact. Dimensions are usually connected to fact tables to give them additional context and meaning. For example, the time dimension• Dimension Attribute: A unique level within a dimension. For example, Month is an attribute

in the Time Dimension.

• Hierarchy: The specification of levels that represents relationship between different attributes within a dimension. For example, one possible hierarchy in the Time dimension is Year → Quarter → Month → Day.

• Conformed Dimension – A dimension that has the same meaning and content when connected to different fact tables. Examples of conformed dimensions include customer, product, employee, partner

• Role Playing Dimension - Dimensions are often recycled for multiple applications within the same database. For instance, a "Date" dimension can be used for "Date of Sale", as well as "Date of Delivery", or "Date of Hire". This is often referred to as a "role-playing dimension".

Page 12: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

StreamCentral Dimension Overview• Dimension are created as data sources

• One time load

• Periodic load – assign frequency

• Dimensions can be built on multiple data sources

• Dimension gets updated as soon as data source is updated with new data

• Pull - Supports Oracle, SQL Server, Flat Files, Excel, MySQL

• Push – API available to push data into data source in real-time

• Supports conformed dimensions, degenerate dimensions and role-playing dimensions

• Dimension data is available in a distributed cache to allow adding of context to real-time data during ingestion – Key feature

• Dimension generates surrogate key based on the selected business key from the fact table

Page 13: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

Custom Dimensions• In conformed dimensions it is the same business key across multiple fact tables

• Build dimension from multiple sources• Build a dimensions with attributes from multiple sources

• Benefit: No need to standardize data before it gets to SC. SC transforms and standardizes data automatically for a dimension

• Handling changing attributes – Key Feature• Fixed Attribute – Attribute always has a fixed value. Does not change. This is the default

characteristic of an attribute

• Historical Attribute – Attribute can change. Once attribute changes, new record is created to show the latest and the historical representations. Start Date, End Date and Surrogate Key.

Benefit: Historical record of every change automatically created and maintained by SC

• Changing Attribute – Attribute value can change. The change will not create a new record. If historical records exist, then option is provided to apply the change to historical records as well

Benefit: Changing values of existing attributes without having to create historical records

Page 14: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

System Dimensions• Time

• Built as a role-playing dimension

• Any granularity of the timestamp in the fact will be linked to the time dimension –Key Feature

• Year, Quarter (default Jan), Month Number, Month Name, Day Name, Date

• Hours, Minutes, Seconds

• Location

• Built as a role-playing dimension

• Any granularity of the timestamp in the fact will be linked to the location dimension

• Support three different representations of Latitude/Longitude

• Support reverse geo-coding with Microsoft Bing or Google Maps API

• Country, State, Region, City, Area, Street Name, Zip Code/Postal Code, Lat/Long –Key Feature

Page 15: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

• StreamCentral auto creates time and location dimensions without the need to explicitly define them. As data sources have fact tables that contain time or location data, StreamCentral starts conforming time and location dimensions

• Extended data types allow very specific association of a variety of time and location based attributes in fact tables

• Time and Location data types can be assigned to attributes in entities, regular data sources and environmental data sources

• For every incoming attribute that is associated with one of the special time or location data types, StreamCentral looks to see if a specific record for that data already exists in the dimension. If not, it creates a new record for that value. If it exists already, then the key value of that data is substituted in the data source. This happens in real-time

• Time and location data is stored in the database and in a distributed cache. Real-time lookups are done against the data stored in the cache

• StreamCentral can dynamically feed time or location data to REST or SOAP based web services from these dimensions

• StreamCentral supports standardizing location data for any geographic level and supports ability to standardize for specific radius

A note on time and location data

Page 16: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

Entities

• Definition: An entity represents a group of people or groups of things, that incoming data is directly connected to. Examples include departments, customers, site, products etc. By defining entities you tell StreamCentral how distributed data is connected to things core to your business. This is an important and unique differentiator within StreamCentral. With this capability raw data is immediately enriched with your business context at the time of ingestion

• Treated same like custom dimensions with additional capabilities

• Entity data can be editable in StreamCentral

• Multiple attributes in an entity can map to identifier attributes from different data sources. Benefit: A data source carrying an identifier that maps to an entity attribute will instantly have access to all the entity data during source data ingestion

• Ability to specify multiple locations for an entity

Page 17: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

Environmental Fact

Page 18: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

• This source of data is used to add context (a Dimension) and measure performance (Fact Table) –These are also called environmental data sources:

Example typically include external data that adds context about external factors in play

Does not have to be connected to the entities directly. StreamCentral will use implicit relations with time and location dimension to tie environmental data to other enterprise data. For example, consider an environmental data source called weather. Weather has location information associated with it. There are two entities namely “Customer” and “Tower”. Both also have location information associated with them. StreamCentral standardizes all three to the location dimension but StreamCentral also implicitly connects Customer to weather and Tower to weather because weather was created as an environmental data source. Now when analyzing data, StreamCentral will be able to provide real-time or historical context as to what the weather is where the customer is and what the weather is where the tower is

Great to use in data marts for analyzing associations with other data

Can be used in event detection as part of conditions set and to evaluate events

Types of data sources: Environmental

Page 19: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

Environmental Facts

• Same as regular StreamCentral fact tables with additional functionality

• As a Fact• Attributes can be selected as measures to be included in data marts

• As a dimension• It connects to all fact tables on time and location attributes automatically

Page 20: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

StreamCentral 360o Data Marts

Page 21: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

StreamCentral 360o Data Mart Overview

• Fully automated system from building data mart, modifying data mart and continually keeping data mart refreshed with new and updated data

• The StreamCentral intelligence brings together all related and relevant pieces of information to the user during creation of the data mart using the StreamCentral Workbench

• Data Mart creation and modification involves no coding. StreamCentral intelligence takes care of alerting user of possible relationships that can be included in the data mart for wider association analysis

Page 22: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

360o Data Mart Overview• StreamCentral Real-time 360o Data Mart

• Data Marts updated in real-time – Data is updated in the data mart in parallel with data being updated in the information warehouse (fact table)• Data Mart connected to an event – Auto builds a 360o view of the event

• Data Mart connected to the entire data warehouse

• Storage Types:• Custom Start date with no end date

• Rolling Window (fixed amount of time that keeps moving)

• StreamCentral Historical 360o Data Mart• Data Marts updated on demand and updated on defined frequency

• Two types:• Built using LiveJoin (inherits the relationship definition defined to detect events)

• Built using OnDemand Join (define custom relationships in data)

• Storage Types:• Custom Start date with no end date

• Rolling Window (fixed amount of time that keeps moving)

• Snapshot – Custom Start Date with Custom End Date

Page 23: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

Types of Data Mart structures

• Pivot• Keeps KPI’s and alerts as columns

• Header Detail• Fact data is maintained as a single record in header table

• Any KPI’s and alerts that make the data duplicate will be maintained in a detail table

• Flattened• Fact data is duplicated in a flattened structure – all facts, KPIs and alerts in

one flattened structure

Page 24: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

Logical and Physical Model

• StreamCentral builds both the logical and physical models for the information system

• The physical model is supported for:• Microsoft SQL Server

• HP Vertica

• MongoDB (Q3 2014)

• Hadoop (Q3 2014)

Page 25: StreamCentral Information System Overview

© Virtus IT Ltd 2014 - Confidential

Thank you

For more information please contact:

USA:Raheel RetiwallaCTO - Virtus IT Ltde: [email protected]: +1 617 901 8370

UK:Stephen WellsCEO - Virtus IT Ltde: [email protected]: +44 771 113 0879

A trusted partner25