streamcentral information system overview
TRANSCRIPT
© Virtus IT Ltd 2014 - Confidential
A trusted partner
Business Powered By Data
Overview of the StreamCentral Information System
Raheel [email protected]
© Virtus IT Ltd 2014 - Confidential
Goals of the StreamCentral Information System
• Build high impact business solutions from 0 – finish in days or weeks
• Move traditional batch based Business Intelligence to real-time Business Intelligence
• Account for streaming data and the need to understand and analyze streaming data in real-time
• Easily connect to the cloud
• Account of traditional static data
• Include events as a core part of real-time analytics
• Require little or no knowledge of Business Intelligence design patterns, practices or terminology
• Introduce design automation that continually infers relationships in data to add additional value to collected data
• Take advantage of Big Data data stores like NoSQL, MPP and Hadoop with little or no knowledge of the underlying technology
© Virtus IT Ltd 2014 - Confidential
Collect Stream Data Collect static data
Stream event processing Data transformation
KPI & Conditions Correlation
Connect data to business entities Standardize time & location
Auto build data marts that are updated in real-time
Auto build data marts that are updated periodically
• One product that already comes built in with integrated components that work seamlessly together
• High Automation• No coding required
• Focus on the solution and not on integration• Can be used by data scientists and
developers• Does not require extensive specialist skill
Building a Big Data Business Solution WITH StreamCentral
Time
Latest innovationin Big Data technologies
Hadoop
MPP NoSQL Caching
© Virtus IT Ltd 2014 - Confidential
StreamCentral
BI Server – Run with Scale
Information Warehouse Manager – Auto Build
Workbench – Easy to Design
Rapidly industrialize the use of data by designing, building and running real-time business intelligence and Big Data solutions with StreamCentral
Functional Application
Event Driven Predictive Analytics
Industry Application
BI / ReportingData Exploration /
Visualization
Analytic Applications
Association Analysis
Business Event Detection
Data Publishing: SQL Server,
Vertica, MongoDBData ExportData Collection Data Processing Caching
Security Designer Systems Management
Solution Designer (Data Consumption, Data
Transformations, Conditions, Event, Correlation)
API Designer
De-normalized schema generation for data marts
Security schema generationMeta Data ManagerNormalized schema
generation for Fact and Dimensions
Auto generate database design, auto generate database and application code, infer relationships in data
StreamCentral - Big Data Plumbing
Focus of this presentation
© Virtus IT Ltd 2014 - Confidential
The StreamCentral Information WarehouseA dynamic warehouse automatically built and continually managed by StreamCentral
© Virtus IT Ltd 2014 - Confidential
The SC Information Warehouse
• Facts – Two types of fact tables• Regular Fact Tables
• Fact tables integrate the dimension key internally automatically
• Has it’s own surrogate key
• Security Factless Fact tables
• Dimensions• Custom Dimension
• Entities
• Systems Dimensions
• Environmental Facts• Treated as facts and dimensions
© Virtus IT Ltd 2014 - Confidential
The StreamCentral FactDefinition: A fact is a measurable value which represents an actual fact from a specific business process or a system. Examples include sales data, video performance, contract data, health record, reading from a sensor
© Virtus IT Ltd 2014 - Confidential
Facts Overview
• Facts are created as data sources in StreamCentral• One time load
• Periodic load – assign frequency
• Pull - Supports Oracle, SQL Server, Flat Files, Excel, MySQL
• Push – API available to push data into a StreamCentral data source in real-time (JSON and XML) – Useful to have real-time “facts” being pushed directly to StreamCentral
• Dimension surrogate keys get appended to fact in real-time in parallel to the dimension update process. Key Feature: Allows system to be updated and available in real-time
© Virtus IT Ltd 2014 - Confidential
Security Fact Table
• User can specify security rule for data access at the attribute level for a given security role using the Workbench
• StreamCentral automatically creates a Factless Fact Table – Key Feature
• The Fact table records the role and the sequence of a record in a data source
• As soon as a new record is entered into a source, the security sequence along with roles that have access to the sequence is maintained in this table
© Virtus IT Ltd 2014 - Confidential
The StreamCentral Dimension
© Virtus IT Ltd 2014 - Confidential
Definitions
• Dimension – A dimension is a category of information that can be used to further understand a fact. Dimensions are usually connected to fact tables to give them additional context and meaning. For example, the time dimension• Dimension Attribute: A unique level within a dimension. For example, Month is an attribute
in the Time Dimension.
• Hierarchy: The specification of levels that represents relationship between different attributes within a dimension. For example, one possible hierarchy in the Time dimension is Year → Quarter → Month → Day.
• Conformed Dimension – A dimension that has the same meaning and content when connected to different fact tables. Examples of conformed dimensions include customer, product, employee, partner
• Role Playing Dimension - Dimensions are often recycled for multiple applications within the same database. For instance, a "Date" dimension can be used for "Date of Sale", as well as "Date of Delivery", or "Date of Hire". This is often referred to as a "role-playing dimension".
© Virtus IT Ltd 2014 - Confidential
StreamCentral Dimension Overview• Dimension are created as data sources
• One time load
• Periodic load – assign frequency
• Dimensions can be built on multiple data sources
• Dimension gets updated as soon as data source is updated with new data
• Pull - Supports Oracle, SQL Server, Flat Files, Excel, MySQL
• Push – API available to push data into data source in real-time
• Supports conformed dimensions, degenerate dimensions and role-playing dimensions
• Dimension data is available in a distributed cache to allow adding of context to real-time data during ingestion – Key feature
• Dimension generates surrogate key based on the selected business key from the fact table
© Virtus IT Ltd 2014 - Confidential
Custom Dimensions• In conformed dimensions it is the same business key across multiple fact tables
• Build dimension from multiple sources• Build a dimensions with attributes from multiple sources
• Benefit: No need to standardize data before it gets to SC. SC transforms and standardizes data automatically for a dimension
• Handling changing attributes – Key Feature• Fixed Attribute – Attribute always has a fixed value. Does not change. This is the default
characteristic of an attribute
• Historical Attribute – Attribute can change. Once attribute changes, new record is created to show the latest and the historical representations. Start Date, End Date and Surrogate Key.
Benefit: Historical record of every change automatically created and maintained by SC
• Changing Attribute – Attribute value can change. The change will not create a new record. If historical records exist, then option is provided to apply the change to historical records as well
Benefit: Changing values of existing attributes without having to create historical records
© Virtus IT Ltd 2014 - Confidential
System Dimensions• Time
• Built as a role-playing dimension
• Any granularity of the timestamp in the fact will be linked to the time dimension –Key Feature
• Year, Quarter (default Jan), Month Number, Month Name, Day Name, Date
• Hours, Minutes, Seconds
• Location
• Built as a role-playing dimension
• Any granularity of the timestamp in the fact will be linked to the location dimension
• Support three different representations of Latitude/Longitude
• Support reverse geo-coding with Microsoft Bing or Google Maps API
• Country, State, Region, City, Area, Street Name, Zip Code/Postal Code, Lat/Long –Key Feature
© Virtus IT Ltd 2014 - Confidential
• StreamCentral auto creates time and location dimensions without the need to explicitly define them. As data sources have fact tables that contain time or location data, StreamCentral starts conforming time and location dimensions
• Extended data types allow very specific association of a variety of time and location based attributes in fact tables
• Time and Location data types can be assigned to attributes in entities, regular data sources and environmental data sources
• For every incoming attribute that is associated with one of the special time or location data types, StreamCentral looks to see if a specific record for that data already exists in the dimension. If not, it creates a new record for that value. If it exists already, then the key value of that data is substituted in the data source. This happens in real-time
• Time and location data is stored in the database and in a distributed cache. Real-time lookups are done against the data stored in the cache
• StreamCentral can dynamically feed time or location data to REST or SOAP based web services from these dimensions
• StreamCentral supports standardizing location data for any geographic level and supports ability to standardize for specific radius
A note on time and location data
© Virtus IT Ltd 2014 - Confidential
Entities
• Definition: An entity represents a group of people or groups of things, that incoming data is directly connected to. Examples include departments, customers, site, products etc. By defining entities you tell StreamCentral how distributed data is connected to things core to your business. This is an important and unique differentiator within StreamCentral. With this capability raw data is immediately enriched with your business context at the time of ingestion
• Treated same like custom dimensions with additional capabilities
• Entity data can be editable in StreamCentral
• Multiple attributes in an entity can map to identifier attributes from different data sources. Benefit: A data source carrying an identifier that maps to an entity attribute will instantly have access to all the entity data during source data ingestion
• Ability to specify multiple locations for an entity
© Virtus IT Ltd 2014 - Confidential
Environmental Fact
© Virtus IT Ltd 2014 - Confidential
• This source of data is used to add context (a Dimension) and measure performance (Fact Table) –These are also called environmental data sources:
Example typically include external data that adds context about external factors in play
Does not have to be connected to the entities directly. StreamCentral will use implicit relations with time and location dimension to tie environmental data to other enterprise data. For example, consider an environmental data source called weather. Weather has location information associated with it. There are two entities namely “Customer” and “Tower”. Both also have location information associated with them. StreamCentral standardizes all three to the location dimension but StreamCentral also implicitly connects Customer to weather and Tower to weather because weather was created as an environmental data source. Now when analyzing data, StreamCentral will be able to provide real-time or historical context as to what the weather is where the customer is and what the weather is where the tower is
Great to use in data marts for analyzing associations with other data
Can be used in event detection as part of conditions set and to evaluate events
Types of data sources: Environmental
© Virtus IT Ltd 2014 - Confidential
Environmental Facts
• Same as regular StreamCentral fact tables with additional functionality
• As a Fact• Attributes can be selected as measures to be included in data marts
• As a dimension• It connects to all fact tables on time and location attributes automatically
© Virtus IT Ltd 2014 - Confidential
StreamCentral 360o Data Marts
© Virtus IT Ltd 2014 - Confidential
StreamCentral 360o Data Mart Overview
• Fully automated system from building data mart, modifying data mart and continually keeping data mart refreshed with new and updated data
• The StreamCentral intelligence brings together all related and relevant pieces of information to the user during creation of the data mart using the StreamCentral Workbench
• Data Mart creation and modification involves no coding. StreamCentral intelligence takes care of alerting user of possible relationships that can be included in the data mart for wider association analysis
© Virtus IT Ltd 2014 - Confidential
360o Data Mart Overview• StreamCentral Real-time 360o Data Mart
• Data Marts updated in real-time – Data is updated in the data mart in parallel with data being updated in the information warehouse (fact table)• Data Mart connected to an event – Auto builds a 360o view of the event
• Data Mart connected to the entire data warehouse
• Storage Types:• Custom Start date with no end date
• Rolling Window (fixed amount of time that keeps moving)
• StreamCentral Historical 360o Data Mart• Data Marts updated on demand and updated on defined frequency
• Two types:• Built using LiveJoin (inherits the relationship definition defined to detect events)
• Built using OnDemand Join (define custom relationships in data)
• Storage Types:• Custom Start date with no end date
• Rolling Window (fixed amount of time that keeps moving)
• Snapshot – Custom Start Date with Custom End Date
© Virtus IT Ltd 2014 - Confidential
Types of Data Mart structures
• Pivot• Keeps KPI’s and alerts as columns
• Header Detail• Fact data is maintained as a single record in header table
• Any KPI’s and alerts that make the data duplicate will be maintained in a detail table
• Flattened• Fact data is duplicated in a flattened structure – all facts, KPIs and alerts in
one flattened structure
© Virtus IT Ltd 2014 - Confidential
Logical and Physical Model
• StreamCentral builds both the logical and physical models for the information system
• The physical model is supported for:• Microsoft SQL Server
• HP Vertica
• MongoDB (Q3 2014)
• Hadoop (Q3 2014)
© Virtus IT Ltd 2014 - Confidential
Thank you
For more information please contact:
USA:Raheel RetiwallaCTO - Virtus IT Ltde: [email protected]: +1 617 901 8370
UK:Stephen WellsCEO - Virtus IT Ltde: [email protected]: +44 771 113 0879
A trusted partner25