end user informatics
TRANSCRIPT
InformaticsAmbareesh Kulkarni
Informatics defined
• Informatics is the application of technology to bring Data, People and Systems together
• Bioinformatics is very Complex representation of Simple data
• Cheminformatics is very Simple representation of Complex data
2
Current State
Problem Statement….
“There's too much data and it's duplicated hundreds of times. The mistake companies make is that they start from the data they have. They need to ask what data do their users need and what are the questions they are asking. Understand the questions, how they can be answered and what kind of data is needed.”
Quote by CIO of Major Corporation
Integrated Solutions - Business Case:IDC White Paper
• Information Tasks– Email – 14.5 hours a week– Create documents – 13.3 hours a week– Search – 9.5 hours a week– Gather information for documents – 8.3 hours a
week– Find and organize documents – 6.8 hours a week
• Gartner: “Organizations spend an estimated $750 Billion annually seeking information necessary to do their job.”
• Time Wasted (per year)– Reformat information - $57 million per
10,000 users– Not finding information - $53 million
per 10,000 users– Recreating content - $45 Million per
10,000 users
Data Integration- Business Case:IDC White Paper
• Reduce development costs, cycle times– Increase employee efficiency
– Less time looking, more time doing
• Enhance communication– Capture and reuse knowledge
– Innovate better & faster
• Cost of not finding right information– Business – lost money, opportunities
Data Integration - Business Case: General ROI issues IDC White Paper
Key Takeaways
• Data Integration is not easy and represents ~80% of effort for a typical data integration project.
• Incompatible data are the largest, most expensive, and time-consuming portion of IT projects.
• Most data is in an unstructured format (outlook, word, PDF, images etc.)
8
Evolution of data integration technologies
Evolution of Integration Architectures
Point to Point HUB + Spoke HUB + EII
Defining EII, EAI, ETL Data Integration
EII EAI
Enterprise Information Integration Enterprise Application Integration
Reports from multiple apps/data sources
Transactions to multiple apps
e.g. Real-time access to product silos for customers, employees
e.g. Compound name change in one application propagated to other products
EII ETL
Real-time Batch
Extract, Transform, Reportin real-time
Extract, Transform, Load;later report on data warehouse
e.g. report data from operational applications
e.g. build duplicate reporting data mart and/or redesign data warehouse
Enterprise Application Requirements
Tools vs. Development Platform
Tools
Development Platform
What do end users really care about?
• The Internet has raised the bar for Informatics expectations
• Complex Query? Millions of Rows? Full table Scan?
• Users don’t really care. If they can view stock prices in real time, why not corporate data.
• In an ideal world, data analysis needs to be at speed of thought.
• Bigger, better, faster, cheaper
Business users view
Data
Pipeline Pilot
Reports
IT perspective
Key Takeaways
• Provide an Integrated view of data across multiple systems; flat files, data warehouses , data marts.
• Avoid “boiling the ocean” Jump start data integration efforts with PP to quickly meet an important user requirement and then decide if the data should be persisted in a data warehouse or data mart.
16
Use Pipeline Pilot to:
Action from InsightData is a New form of Energy
Why is data integration so important?
18
• Data in any organization is distributed in various disconnected and disparate systems
• There is always a need to combine most current data with historical values
• The success of the internet has created data sources outside the internal network
• Data has informational value only when combined with other & related data
WARNING SIGNS : Of Poor Data Integration
19
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
Presentations or discussions that are prefaced with statements like “most of our analysis would have been accurate, except for the missing data from….” or“Due to discovery of data not included in the last analysis , we are reversing our decision to……”
WARNING SIGNS : Of Poor Data Integration
20
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
As a result of an out-of-order condition for a critical chemical, a scientist must expedite the order and pay a premium price.When the chemical arrives the scientist (or worse her boss) discovers that another division had excess quantity of the same chemical and was looking to sell it at a discount.
WARNING SIGNS : Of Poor Data Integration
21
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
Scientists argue about the fact that analysis results differ-even though the data came from the same operational data source
WARNING SIGNS : Of Poor Data Integration
22
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
A technician alerts his management team of scientists to a potential problem discovered while running a query against a database.The technician cannot, however, answer the follow-up question , ” How long has the problem existed?”
WARNING SIGNS : Of Poor Data Integration
23
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
A Scientist runs a report every week against a LIMS, however to see a period-to-period comparison, the scientist maintains a spreadsheet into which he creates a new column every week and enters the data manually
WARNING SIGNS : Of Poor Data Integration
24
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
A customer calls tech. support to enquire about a pending case. While the customer support engineer has access to the case details, has no information available on whether the customer is current on maintenance, how many end-users they are licensed for or what options the customer has purchased.
WARNING SIGNS : Of Poor Data Integration
25
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
Minor change-requests take weeks to be implemented, any modifications have to be thoroughly tested for accuracy and integrity,
WARNING SIGNS : Of Poor Data Integration
26
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
CEO and CFO are uncomfortable signing off on the quarterly numbers as there is no way to trace the numbers back to the source systems.
Case Study (closer to home): Services Order Report
• Poor data quality• Redundant information• Duplicate entries• Hard to read• Huge amount of time required to clean it up
Information-sensitivity
• Data Availability and Accessibility• Data Quality
– DQ = Completeness X Validity– E.g. Measure of Completeness = # of null values in a column– E.g. Measure of Validity = “ We have 4 regions, but there are 18
distinct values in the region column”– Pitfall: Don’t take accountability for DQ on the source system– Push accountability where it belongs, in the source system(s)
• Timeliness of Data, relevant to the questions being asked by the user
• SQL and programming accuracy
Information Quality is a Direct Function of……
Case Study (closer to home): Internal Revenue Forecasting process
Orders QTD Pipeline Delivered Forecast
Run the Services Products and Orders report in RSVPP ……; Export out the results and filter for product services (Column AM) and sum the Total Sale Price USD column
Run the Services Opportunities report in SFDC;export out the result……
Assuming Access is up to date………..; export to Excel; filter by product services and sum USD Amount columnAssuming Access is up to date run the Total Forecast report;
Export to Access; …………
Near real-time data access
Extract, Transformation & Load=Push big data
• Batch extract from transaction systems• Bulk transformation• Push load into data warehouse
Extract Load
Transformation
Data Warehouse
Real Time
32
Pipeline Pilot and Real time Data access
Data Access Data Adapters
Data Transformation Transform Calculate Security
Relational Flat Files ERPLegacy EJBXML
<XML>
Information Access Web Services ODBC JDBC
• Flexible Data Access capabilities• Single access point to data
• Consumer sees only the end result
• Shared platform service• Available to all technologies
• Reusable building blocks• Targeted to specific needs
• Reduces costs and time to market
• Supports incremental development
Case Study: PI Historian
33
• PI Historian, product provided by OSI, captures data real-time from the research test rigs
• Data capture in PI is triggered by events• PP allows scientists to read the data from PI historian as it
becomes available and also combine it with other information (e.g. associate real-time test data with historical characteristics of a catalyst
Data provisioning pros and cons
OLTP ReplicationData Marts
Enterprise Data Warehouse
Pipeline Pilot
Data QualityEase of enquirySystem PerformanceHistory
Scalability
Speed to information
Data IntegrationTotal Cost of Ownership
Really Matters
1 “Just give me a list of compounds from the database, sorted by compound name”
Evolution Of an Informatics System
“We also need to see the related toxicology information and for the list to be grouped by compound”
12
Evolution Of an Informatics System
“We’d like to get a list of some of the related compound information, too, grouped by the first letter of the compounds name.”
12
3
Evolution Of an Informatics System
“Actually, we’d like to be able to produce a completely separate report for compound and related toxicology information .”
12
3
4
Evolution Of an Informatics System
Evolution Of an Informatics System
“We don’t like running the reports manually. Can they be scheduled?”
12
3
4
5
Evolution Of an Informatics System
“We have quite a few users using this system now and there’s some fairly sensitive data in there.”
12
3
5
6
4
“We need to be able to drill down into more detail”
7
12
3
5
6
4
Evolution Of an Informatics System
7
8
12
3
5
6
“We need to track which users have used what Protocols”
4
Evolution Of an Informatics System
“We need to be able to easily search the information we need.”
9
6
8
4
7
12
3
5
Evolution Of an Informatics System
Evolution Of an Informatics System
9
6
8
4
7
12
3
5
“We need these reports linked to our business process”
“We need to be able to approve or reject the reports”
“We need a single version of the truth”
“We don’t want to be waiting around for the results”
“We don’t want to be re-typing information from these reports into our other application”
“We need to be able to see the underlying detail”
“We need to print the reports out to take into meetings”
“We need the output as Excel”
“We need charts”
“We need to know who’s looked at the reports”
“We need a simple way to see the entire contents of the report”
“We need a report that looks like an existing flow chart”
Hidden Costs
• Organizations that believe that they can build a data integration solution at the fraction of cost of a COTS solution….
• Discover that any savings in up-front costs are very quickly incurred multiple times over the lifetime of the solution
• Typical effort to build a custom data integration solution can be upwards of 5000-5500 man days
• Some of the tasks that need to be undertaken to provide a functioning solution:
Application Architecture Data cleansing & enrichment services
Integration framework
User Interface design Common field matching Security
Batch processing capabilities
Application Integration Audit & Logging capabilities
Build versus Buy Decision Criteria
47
Data Integration Considerations Build your own Buy
Initial Start-up cost Lower Higher
Ongoing Operating cost Higher Lower
Ongoing Support & Maintenance In-house responsibility Vendor
One time “quick and dirty” task Consider Maybe overkill unless one-time task becomes ongoing request
IT Staff requirements Higher Lower
IT Productivity Detracts from Contributes to
Data sources/data targets Single/single Multiple/multiple, Multiple/single, Single/multiple
Complex transformations Limited: IT must write complex code
Comprehensive
Integration Usually overlooked Industry standards
Industry TrendsEnd-user Informatics
Web 2.0What’s Setting Expectations Today
Next-Generation Enabling Technologies & New User Demands Are Emerging
•Rich Internet Experience
•Web 2.0
•Portlet components
•XML and derivatives
•Dynamic, Ajax-based UI
•Rich Internet Experience
•Web 2.0
•Portlet components
•XML and derivatives
•Dynamic, Ajax-based UI
SOA Infrastructure
Leverage existing systems and components
Standardization
Data-driven environment
Open APIs to customize apps
SOA Infrastructure
Leverage existing systems and components
Standardization
Data-driven environment
Open APIs to customize apps
Personal Dashboards
Integrate data from multiple sources
Multi-account views
Cross-account planning
Personal Dashboards
Integrate data from multiple sources
Multi-account views
Cross-account planning
Web 2.0 features on our projects
51
Web 2.0 features on our projects
52
Advanced Reporting/Visualization Collection
53
Scientific Business Process Management and PP
54
• Fuse scientific and analytical data with process data• Use Pipeline Pilot in automated process decisions • Display reports and data at appropriate points in the
process• Use data to modify process execution
Consolidated Informatics Platform
Consolidated Informatics PlatformConsolidated Informatics Platform
Many Databases Many Tools
Spreadsheets Analytics
Scorecards
Dashboards
Self- service Reports
Data Mining
Portals
Web Reports
Web Reports
Current
Future
Many Databases
Key Takeaways
• Provide Accurate, Integrated & Seamless Informatics Solutions
• Reduce redundant and replicated data bases
• Rationalize existing Reporting tools and technologies
• Build Agile, Flexible and Reusable solutions
• Empower the end-users “Shift Right”
Shift Right