co-directors: yigal arens usc / information sciences institute judith klavans columbia university

26
Co-Directors: Yigal Arens USC / Information Sciences Institute Judith Klavans Columbia University

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Co-Directors:

Yigal ArensUSC / Information Sciences Institute

Judith Klavans

Columbia University

2

The purpose of DGRC

To Make Digital Government Happen• Advance information systems research• Bring the benefits of cutting edge IS research to

government systems• Help educate government and the community• Learn needs from government partners to drive

next stage system development• Build pilot systems as part of new infrastructure

3

The problem and the solution

Solution: Create a system to provide easy standardized access: need multi-database access engine, need powerful user interface, need terminology standardization mechanism.

Problem:FedStats has thousands of databases in over seventy Government agencies: data is duplicated and near-duplicated, even Government officials and specialists cannot find

it

4

The Vision: Ask the Government...

How have property values in the area changed over the past decade?

How many people had breast cancer in the area

over the past 30 years?

Is there an orchestra? An art gallery? How far are the nightclubs?

We’re thinking of moving to Denver...What are the schools like there?

CensusLaborStats

5

Research challenges

Scale to incorporate many databases… build data models automatically

Process large and disparate data efficiently… develop fast processing techniques… create aggregation and substitution operators

Integrate data models across sources and agencies…take a large ontology and link the models into it

automatically… develop ways to automatically harvest glossary data for

building ontologies

Develop new ways to interact with data… use language processing tools for question-answering

Display complex information from distributed sources…develop and evaluate new presentation techniques

6

The Energy Data Consortium EDC members

Government partners

Research challenge

Information Sciences Institute, USC Columbia University

Energy Information Admin. (EIA) Bureau of Labor Statistics (BLS) Census Bureau

Make accessible in standardized way the contents of thousands of data sets, represented in many different ways (webpages, pdf, MS Access, text…)

Xxx x xXxx xxX xxx xXxxx xXxxxxxxx

Xx xxXxx xxXx xxxX Xxx x xxx

x x x x

7

The Vision: Ask the Government...

Are alternative energysources any cheaper touse?

Which state has the

highest oil production?

How long has thenuclear plant been inservice?

We’re thinking of moving to Cambridge…How much does gas cost there?

CensusLaborStats

8Data Integration

Labor

EPA

EIA

Census

Heterogeneous DataSources

User InterfaceInformation Access

DefinitionOntology

query

9

From Phase I to Phase IIPhase One Terminology/ontology Information integration and in-memory data

analysis New Interfaces for Complex Human-computer

interactionPhase Two Question-Answering Usability Testing and Evaluation Privacy Portal

10Data Integration

Labor

EPA

EIA

Census

Heterogeneous DataSources

User InterfaceInformation Access

DefinitionOntology

Trade

MainMemoryQuery

Processing

Question-AnswerAccess

User Evaluation

Task-basedEvaluation

query

11Data Integration

Labor

EPA

EIA

Census

Heterogeneous DataSources

User InterfaceInformation Access

DefinitionOntology

Trade

MainMemoryQuery

Processing

Question-AnswerAccess

User Evaluation

Task-basedEvaluation

query

12Data Integration

???

EPA

EIA

Census

Heterogeneous Data& Meta-data Sources

User InterfaceInformation Access

Data Definitions(Ontology)

interface

queryLabor

definitions

Metadata mediates

13

http://www.eia.doe.gov/emeu/states/main_ca.html

Recent exampleEIA problem: Data cleared for

publication is grouped together across states

Also need data gathered by state separately

Need general ability to ungroup and reaggregate data

http://www.eia.doe.gov/emeu/states/main_ca.html

14

Main Memory

Achievements on large data manipulation – optimization for efficiency and speed

New input for visualization with dials that user can manipulate

Applications with electoral boundaries

15

Get Gloss The Identification of Glossaries in High

Fan-out Websites Large sites with many links Glossaries hidden all over No coherent view within and across

sites No way to determine who is defining

what and how

16

Glossary Finding Function

Function to compute a best guess score Ranked list Higher is better

Evaluation to determine how likely it is that a high score will be associated with a (large) glossary.

17

ParseGloss Once a glossary is found, then how can

individual definitions be analyzed Once analyzed into components, how

then can this be loaded into the ontology

GetGloss ParseGloss Ontology

18

Evaluation New Effort Peter Sommer, Director of Education

Center for New Media Teaching and Learning

Focus on purposeful use of emerging technologies for researchers, students, teachers, analysts…

Funded by NSF and BLS

19

Privacy Portal Increasing multiple access to data bases

creates a security problem Original DGRC proposal included

component on privacy Newly funded NSF SGER proposal Columbia – Computer Science and

School of Business (Stolfo and Johnson)

20

Privacy and Government Websites What are user fears? What are their preferences? What are their perceptions of privacy

issues? What are the implications for design of

systems and interfaces?

21

Social Science Research

Explorations of “dial manipulation” application for health databases for dynamic querying

Useful for interactive mapping for redistricting Use statistics on neighborhoods, e.g. CPS (long

and wide) Census summary data is another source – tables

compiled for various levels Joint with ISERP Social Science Research

Center

22

Proposals

SGER proposal funded Topic: Urban transportation study—new methods for

freight tracking in LA by comparing across databases Grant awarded to USC, shared by ISI and USC’s Dept of

Policy and Planning White paper to DoT

Topic: Searching for patterns in freight traffic Submitted by USC campus people and Jose Luis Ambite

ITR proposal submitted Topic: Semi-automated topic hierarchy creation Partners: Eduard Hovy communicated with EPA group If funded will use EPA’s CARAT ontology as starting point

and evaluation standard

23

Digital Government is Here! An increasing quantity and variety of

information is available in digital form Government agencies already collect much

digital information Government is a holder and provider of often

unique data and services Access to information/services by industry

and citizen-users must be facilitated, while limiting cost and risk

24

Well – Not Quite... Expectations are very high due to the

pervasiveness of Web/Internet information technology

Government IT/IS is behind best practices Legacy, stovepipe systems designed for trusted

staff Failed very large modernization efforts

A disconnect exists between the research community and government IS

25

The purpose of DGRC

To Make Digital Government Happen• Advance information systems research• Bring the benefits of cutting edge IS research to

government systems• Help educate government and the community• Learn needs from government partners to drive

next stage system development• Build pilot systems as part of new infrastructure

26

Thank you!Any questions?