education data warehousing - qctechnology.com€¦ · education data warehousing solutions data...
TRANSCRIPT
Education Data Warehousing Analyzing trends and uncovering value in education data
White Paper
This document contains Confidential, Proprietary and Trade Secret Information (“Confidential information”) of QC Technology
Decisions Inc. and may not be copied, distributed, duplicated, or otherwise reproduced in any manner without the prior written
consent of QC Technology Decisions Inc.
While every attempt has been made to ensure that the information in this document is accurate and complete, QC Technology
Decisions Inc. does not accept responsibility for any kind of loss resulting from the use of information contained in this
document. The information contained in this document is subject to change without notice.
Published August 2013
Copyright © 2013 QC Technology Decisions Inc. All rights reserved.
Accelerating Education Data Warehousing with Informatica Data Quality
Page 3
$610
Race to the Top
SLDS Program
Overview
State education agencies face many challenges to sustain the benefits now
being realized as a result of significant federal investments in the US Education
System. As of March 2013, The American Recovery and Reinvestment Act
(ARRA) has awarded $6.5 billioni for Race to the Top, a competitive grant
program designed to encourage and reward states that are creating the
conditions for education innovation and reform; achieving significant
improvement in student outcomes, and ensuring student preparation for
success in college and careers. Additionally, the Statewide Longitudinal Data
Systems (SLDS) Grant Program has invested $610 millionii (of which ARRA
contributed $360 millioniii) since 2005 through five rounds of funding to 47
states, the District of Columbia, Puerto Rico, and the Virgin Islands.
While it is clear that there has been a significant investment from federal
government programs due to a variety of factors, at least 34 states and the
District of Columbia have implemented cuts to K-12 education and over 43
states have implemented cuts to public colleges and universities and/or made
large increases in college tuition to make up for insufficient state funding.iv In
light of this budget cutting trend, it is important that states act now and focus
on developing a sustainable model for their longitudinal data systems.
The SLDS program provides grants to design, develop, and implement
statewide P-20 longitudinal data systems to capture, analyze, and use student
data from preschool to high school, college, and the workforce. Although
many states have begun to see value from their investments in these data
systems, the industry believes that there is significantly more value in such
data when it is linked with other “Big Data”.
Optimizing the value derived from these systems requires some skills that
most IT departments lack today. Historically, analytics have been built
without looking at data sets that were very large and complex. Stakeholders
were only tasked with analyzing subsets of structured data – and never
considered the other (and sometimes unstructured data) within the
organization. This meant that there were only a limited number of questions
that could be asked. As the data grows, so do the questions that stakeholders
will want to ask.
The Data Quality Campaign is a strong advocate for empowering stakeholders
with education data. They have defined ten actions that once implemented,
will:
Increase efficiency: Ensure that our education systems produce the
greatest return on investment possible
Improve system performance: Inform system-wide management
and evaluation decisions
Increase transparency: Shine a light onto the education system to
see what works and what does not
Improve student achievement: Inform all education stakeholders’
decisions with quality data to help them make the best decisions to
improve every student’s achievement
Figure 1 Total Investment in SLDS Programs (Millions)
“When schools fail, our children and our neighborhoods suffer." U.S. Secretary of Education, Mr. Arne Duncan, March 11, 2013
Figure 2 Map of Actions Implemented
Although there has been great progress as a result of the significant
investment into SLDS programs, few states have implemented all of the ten
actions required to meet their goals—further demonstrating the need for
additional investment into the SLDS program.
QC Technology Decisions recognizes that most states are actively working to
define and implement these (or similar) actions under numerous budget,
resource, and time constraints. QC has the necessary skills, partners, and
solutions required to fill any gaps that states may have reaching their goals or
sustaining their SLDS solutions.
Education Data Warehousing Solutions Data warehouses have been around for a long time and many organizations
are great at creating them. The data warehouse, admittedly, is a critical
component to the SLDS solution; however, a number of other components are
required to work together to deliver a functional data warehousing solution
that meets the stakeholders’ goals.
As each project is unique, so can the solution for getting answers to questions
using data be unique. As a result, there is strong desire to conform existing
systems, merge data from new systems, and link data to other systems in
order to make sense of the readily available information. As states evolve and
mature their data warehousing practices more and more data is conformed
using a variety of methods to meet their business objectives. There is a
“tipping point” however, where data becomes unmanageable; for many, this
point was crossed the moment they began to link their K-12 education data
with other state systems such as workforce, higher education, and other data
systems.
Figure 2 Education Data Warehousing Solution
“Insystech has provided excellent IT
consulting services to my team at
Fairfax County Public Schools. The
IT consultants provided to us were
high quality professionals who had
the experience and skills that we
needed. FCPS has enjoyed a long
term relationship with Insystech.
The company has provided quality IT
services to our school district since
1997.”
Ken Rice, Coordinator, Instructional Systems, Fairfax County Public Schools, VA
Accelerating Education Data Warehousing with Informatica Data Quality
Page 5
A modern approach to Education Data Warehousing makes managing data
possible through:
Collecting and matching data stored in the current systems while
providing capability to support data exchange to and from other
sources as the system evolves
Organizing a comprehensive longitudinal data repository based on
Data Quality Campaign (DQC) standards consisting of several
integrated data marts that store information for all aspects of the
school system, including student enrollment, student test scores,
student grades, transcripts, program participation, teacher
education/professional development, and school infrastructure
Presenting student trends, performance, progress, and program
effectiveness longitudinally through a common and easy to use web
portal using ad-hoc and interactive reports, charts, and dashboards
Controlling access to information and system capabilities for various
groups including the general public, parents, teachers, principals,
school administrators, analysts, and system
administrators/developers
Data Collection When developing complex data collection and matching processes, data
structures, and reports that make up the data warehouse solution, QC follows
a Kimball approach (considered industry best practice) to data warehousing.
The solution must be designed to integrate data from multiple source systems
and multiple agencies. Data systems that would act as source for the data
warehouse are analyzed to help develop both the business and technical
team’s understanding of the availability and quality of data in the source
systems. This information is also used to complete functional gap analysis
identifying potential gaps in the availability of data for the reports and
dashboards planned for the system.
Data Matching A component essential to the sustainability of the solution is the controlled
vocabulary for which business rules are established to interpret data in the
data warehouse. It is natural for each source system to have their own set of
terms used to describe the data elements they provide to the SLDS; however,
a number of attributes such as “demographic attributes” must be conformed
to establish proper links between the source data to support longitudinal
analysis.
Data transformation (ETL) process are developed to load the data from the
source systems to meet the requirements of the data warehouse, and are
conformed and linked using record matching technology which overcomes
issues such as misspelling, transposition of data elements, internationalization
of names, as well as numerous other data quality issues.
Distribution reports can help decision makers quickly identify best and worst performers (example: What K-12 courses, high school career clusters, programs and other experiences are preparing students well for STEM postsecondary degrees?)
Data Organization Data is loaded into a comprehensive, multi-dimensional, statewide
longitudinal education data warehouse. ETL processes keep the data updated
and organized so that it may be managed consistently over time and
presented to end-users as reports.
For each facet of information managed by the system, additional
characteristics (many of which are shared) that describe these facets are
linked to facilitate data management, reporting, and analysis. These
dimensions help describe what kinds of events and characteristics are
measured over an individual’s (and by extension to groups) learning lifetime.
As educators use this information to make decisions that improve student
performance and/or education policies, an improvement cycle is created;
thus, users of the system can see trends in key performance indicators over
time. This information helps all stakeholders develop a better understanding
of their students which can help institutions better prepare students for the
21st century workforce.
Figure 3 Preparing the 21st Century Workforce
Accelerating Education Data Warehousing with Informatica Data Quality
Page 7
Data Presentation End users will usually access the system via a single web based business
intelligence portal, through which unstructured information may be shared
such as documents, web pages, blogs etc., along-side interactive dashboards
and reports.
Since the SLDS program serves as an authoritative data repository for all
education programs, it creates an opportunity for a variety of different
stakeholders to access a common set of data at different levels of aggregation
and scope:
Student data such as performance data for each student in his/her
class might be made available for teachers. They can use this
information to aggregate that data to analyze trends, determine
what content needs to be reinforced, and decide how to alter
teaching methods to ensure students acquire the content
knowledge and skills in the state’s college and career-ready or
internationally bench-marked standards.
School data to help guide staff time and resources might be
available to school administrators such as teacher assignment,
evaluation, and professional development, student course
assignments and targeted supports, and interim and summative
testing. Of course, administrators will also have access to data that
helps identify which students from which classrooms are off-track to
scoring proficient on end-of-grade or end-of-course exams in key
subjects and why.
School district administrators may have access to data to help
improve curriculum and practices both for their institution(s) and in
under-performing schools to allocate teacher and staff resources,
and to provide professional development opportunities.
Policy makers may have access to valuable data on a robust set of
key performance indicators that measure and report school and
district progress towards college and career readiness goals,
including measures of progress, such as Adequate Yearly Progress.
They may be able to identify schools and districts in need of targeted
supports and interventions and those deserving of recognition for
outstanding achievements. Policy makers can also use the data to
analyze trends across schools and districts helping them evaluate
policies.
Report Annual Trends in KPIs alongside other measures to help discover/predict why and when changes occur (example: What K-12 courses, programs and other experiences are predictive of college readiness?)
Interactive dashboards to display frequently requested information on KPIs at a glance (example: How much student loan debt do students leave compared to their employment rates and earnings?)
Comparing multiple student cohorts over time can help answer specific questions to help support changes to policies (Example: What is the profile of high school students (across time) entering and non-entering some type of postsecondary education immediately following high school graduation)
Education Data Warehousing in Practice In 2012, QC Technology Decisions Inc. as a sub-contractor to Insystech Inc.
completed the design, development, and implementation of a P-20 Statewide
Longitudinal Education Data (SLED) System for the West Virginia Higher
Education Policy Commission. The solution included a data warehouse,
dashboards, and governance that now provides answers to questions from the
state legislature, higher education research staff, K-12 staff including
administrators, teachers, and workforce research staff.
Governance Governance describes the overriding authority and legislative requirements
for all architectural, design, development, and business decisions, including all
policies and procedures created on behalf of the agencies responsible for the
system. The West Virginia P-20 SLED implementation involved the effective
interaction of multiple groups, individuals, and disciplines. To provide a
unified, centrally governed approach throughout the lifecycle of the solution,
a governance model was implemented in conjunction with the technical
solution.
Data Model At a very high level, the solution created for West Virginia structured the data
warehouse to the processes for which education programs are managed and
delivered to students. The data model was developed based on data
warehousing best practices, input from industry experts, and lessons learned.
The P-20 model is the evolution of an award winning K12 data warehouse
created in partnership with Fairfax County Public Schools called EDSL which,
since 2000, continues to be supported and enhanced by Insystech serving over
170,000 students and 25,000 educators. The current model stores outcomes
and other information at point in time events such as graduation, taking a
test, enrolling in a school, getting approval for a student loan, completing a
grade, etc. Each event in the student’s academic history may have one or
more outcomes which are recorded as “facts” that are aggregated at the level
in which it is collected (typically at the student level). This method of
classifying events in the data model enables policy makers to develop a
framework for exploring the data warehouse in terms that are easy to
understand from a business perspective.
Functionality Ability to easily connect to any contributing systems to extract, transform
and load data into the data warehouse
Data exchange capabilities to provide for the easy submission of data
from districts and other partners
Unique record matching technology with “master person index”
which overcomes issues such as misspelling, transposition of data
elements, internationalization of names, as well as numerous other data
quality issues
Comprehensive, multi-dimensional statewide longitudinal education data
warehouse organizing student records from pre-school through age 20
years including enrollment, assessments, earnings, financial aid, debt,
and much more.
Subject specific data marts for cohort comparisons and analysis of various
types high school graduates and degree seekers in areas such as high
school feedback, college performance, college indebtedness and income
earnings, remedial development, workforce feedback, STEM Pipeline,
dual enrollment, teacher education, and proficiency testing predictors.
Public and private web portals with interactive dashboards and
scorecards
Technical Environment
The following off the shelf products were used
to implement the P-20 SLDS solution for West
Virginia:
Network/Operating System: Windows 2008
R2, Secure Public Access (SSL 256bit), Internet
Information Server 7.0, N-Tier Server
Architecture, and VMWare Virtualization
Database: Oracle 11g R2 Enterprise Edition
with Partitioning Option
ETL: IBM Cognos Data Manager 10.1, IBM
Cognos Adaptive Warehouse 10.1, and Oracle
PL/SQL stored procedures
Portal/Reporting: IBM Cognos 10.1, SharePoint
2010
“Insystech team worked well with our
research staff to define standard taxonomy
for the data, gather and process data, design
and develop dashboards and reports.
Insystech team of experts also assisted in the
setup of governance procedures to help
sustain the system.”
Rob Anderson
Executive Vice Chancellor for Administration
West Virginia Higher Education Policy
Commission
Accelerating Education Data Warehousing with Informatica Data Quality
Page 9
Summary QC Technology Decisions Inc. has contributed to the evolution of Education Data Warehousing through the design and development of statewide data warehouse solutions such as the West Virginia P20 SLED (State-Wide Longitudinal Education Data) and local education data warehouse solutions for school districts in the United States. Our team has practical experience integrating data from student information systems, state extracts, and testing solutions using leading integration platforms such as Informatica and SQL Server Integration Services. We can architect and implement solutions using leading business intelligence platforms like Oracle OBIEE, IBM Cognos, Tableau and Microsoft SharePoint. Our data warehousing professional services include:
Enterprise Architecture
Stakeholder outreach and change management
Requirements analysis
Data integration strategy, planning and implementation
Data quality improvement strategy, planning and execution
Enterprise Data Warehouse design, development and maintenance
Extract, transform and load (ETL) process design and development
Custom web application development
Systems Integration
Web Services development
Portal development
Training and documentation
QC Technology Decisions Inc. 26 Crestline Drive
Fredericton, NB E3G 6B1 (506) 470-8938
www.qctechnology.com
© 2013 QC Technology Decisions Inc. All rights reserved. Printed in Canada. QC Technology Decisions Inc., the QC logo are trademarks or registered trademarks of QC Technology Decisions Inc. in Canada and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owner.
i Recovery.gov, http://www.recovery.gov/espsearch/Pages/advanced.aspx?data=recipientAwardsList&AwardType=CGL&CFDA_CODE=257 (March 20, 2013) ii Institute of Education Sciences National Center for Education Statistics, http://nces.ed.gov/programs/slds/stateinfo.asp, (March 20, 2013) iii Recovery.gov, http://www.recovery.gov/espsearch/Pages/advanced.aspx?data=recipientAwardsList&Agency=91&AwardType=CG&CFDA_CODE=256 (March 20, 2013) iv An Update on State Budget Cuts, http://www.cbpp.org/cms/index.cfm?fa=view&id=1214 (February 9, 2011)
About QC Technology Decisions QC Technology Decisions Inc. is best known as a K-12 educational technology firm specializing in business intelligence and enterprise information management. Since 2005, our team has been providing professional IT services in both the United States and Canada to Local and State Education Agencies. Our team has extensive experience and knowledge in working with the development of educational policy, evaluation of educational programs as well as extensive design, deployment and management of information technology solutions including the design and deployment of data warehouse systems.