introduction to big datathousands hours of annotated data. 16 ... analytics regional data hub...

35
Introduction to Big Data: Harnessing the Benefits of Data Powered Governance Board of Directors Item 10 | January 24, 2020 Data Science: Connecting the dots in emerging science, society & living Rajesh K. Gupta Director 2 DRAFT

Upload: others

Post on 29-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

Introduction to Big Data: Harnessing the Benefits of Data Powered Governance

Board of Directors Item 10 | January 24, 2020

Data Science:Connecting the dots in

emerging science, society & living

Rajesh K. GuptaDirector

2

DRAFT

Page 2: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

Data have always been with us.

What has changed?

3

1. Ubiquity and scale of data that can be collected

4

DRAFT

Page 3: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

5

1. Ubiquity and scale of data that can be collected

2.Development of sophisticated analytics tools

6

DRAFT

Page 4: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

7

1. Ubiquity and scale of data that can be collected

2. Development of sophisticated analytics tools

Use of data in ways not previously imagined      

8

DRAFT

Page 5: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

http://firemap.sdsc.edu

Santa Rosa fire, October 2017

http://wifire.ucsd.edu

9

New algorithms for locally sensitive hashing, approximate computing

10

DRAFT

Page 6: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

Similar structures have similar effects

Toxicology tested by matching against a massive DB of molecular structures and safety data

11

Machine predictions that can reduce crime or amount of jailing

12

DRAFT

Page 7: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

New algorithms for locally sensitive hashing, approximate computing

Similar structures have similar effects

Toxicology tested by matching against a massive DB of molecular structures and safety data

Machine predictions that can reduce crime or amount of jailing

Materials Society

Biology

13

14

DRAFT

Page 8: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

15

Data being used everywhere in daily lives

[Faster R-CNN: Ren, He, Girshick, Sun 2015]

Detection Segmentation

[Farabet et al., 2012]

Lots of training data:• Vision systems trained on millions of images• Speech systems training on tens of

thousands hours of annotated data.

16

DRAFT

Page 9: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

FB data

Psychographic data

Tweets

Browsing data

Voter registration

Public databases

. . . and more

Personality traits Race & 

ethnicity

Political views

Religious views

Sexuality

Profession

17

HDSI: A Convergence of Multiple Long‐term Initiatives Launched in March 2018

18

DRAFT

Page 10: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

188 Faculty Drawn from All Areas of University 

EnvironmentTechnologyHealth

MicrobiomeScienceBig Data

Understandingand Protecting the Planet

HDSI

Enriching Human Life and Society

Exploring the Basis of Human Knowledge, 

Learning and Creativity

Understanding Cultures and Addressing 

Disparities in Society

19

Questions and Answers

20

DRAFT

Page 11: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

21

Role of Data at SANDAGSpeaker: Ray Major

22

SANDAG is creating an environment where economic and social benefits are recognized and achieved using empirical information.

BigData

Regional Plan

DRAFT

Page 12: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

23

SANDAG’s evolution as a data driven organization

INRIX Travel Time Data Incorporated in ABM

Work begins on Activity Based Model

1999 2005 2006 2009 2010 2012 2013 2016 2017 2018 2019 2020

Intermodal Transportation Management System Integrated

Corridor Management

Data Governance (2018)

Grant to Develop Solution for Analytics

Regional Data Hub

Intelligent Transportation System Architecture Regional

Arterials Management System

Streetlight Origin-Destination Data Acquired for ABM

ATRI Data Commercial Vehicles

AirSage Data for Military Base Trip Validation

Data used to drive Regional Plan

24

Where (BIG) data comes from

External Data

• Navigation apps• Google• Weather • Uber/Lyft• Bird/Lime• Construction• Inrix/streetlights• Telematics

Internal Data

• Sensors• Signals• Cameras• Public transit• Traffic incidents• Crime• Detectors

DRAFT

Page 13: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

25

BenefitsFuture

Opportunities

Why big data is essential in making policy decisions for the region

26

Analytic Continuum

What happened

Why did it happen

.

What is happening

now

What may happen

Based on History

What possibly will happen

Based in Real Time

.

What actions should be

Reporting

High

Low High

Complexity

Value of Information

Analysis

Monitoring

Forecasting

Prescriptive

PredictiveDRAFT

Page 14: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

27

Anonymization/privacy of the economic, demographic, and transportation data

Anonymization

Data Governance

PrivacySecurity

Smart Cities & Open Data

01/24/20

Andrell Bower, Chief Data Officer

28

DRAFT

Page 15: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

01/24/20 29

Background

Open data – an initiative launched in 2015

Engaged with residents and increased transparency

Data is a valuable asset that helps us meet strategic goals

PUBLIC FACING … … WHILE SHIFTING CULTURE

01/24/20 30

Inform policy

We can decide the next category to add to Get it Done by conducting content analysis and text mining on reports submitted to the ”Other” category.

DRAFT

Page 16: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

01/24/20 31

Improve service

We used 911 call volumes andambulance time on task to

calculate the optimal number of ambulances for each of our

response zones.

01/24/20 32

Evaluate programs

A tool that calculates parking meter utilization by block using daily transaction data will help program managers pilot changes and make timely adjustments.

DRAFT

Page 17: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

01/24/20 33

Thank you

Andrell Bower, [email protected] our Open Data PortalView our open-source data projects on Github

Central Ohio Case Studies inCivic Data Privacy & Governance

Christina Drummond, M.A. ISTPProgram Manager and Senior Analyst

Moritz College of Law, The Ohio State [email protected]

Presentation to SANDAGJanuary 24th, 202034

DRAFT

Page 18: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

35

Professional Networks

Efforts

36

Regional EffortsDRAFT

Page 19: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

37

• ���������� ������������ � ��� ��� � ������

• ������ �� � ����� ���������� ������� �� ��������� �� � �������

• � ������������������������ �� � ����� ����������

Regional Data Advisory Committee (RDAC)

https://www.morpc.org/committees/regional-data-advisory-committee/Contact Aaron Schill, Director, Data & Mapping. [email protected]

38

������� ��������� �������

! "���������� ��#��������� ��� ����

$! � � � ���%�������� ������ ����

&! '(���)� � ������ ���

*! +���� ������ ������� ��� ��������� ���

,! - ���� ����������������%� ���.���%

MORPC Regional Data Agenda

https://www.morpc.org/wordpress/wp-content/uploads/2019/03/DATA-POLICY-AGENDA_FINAL_WEB.pdf

DRAFT

Page 20: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

39

• � � �����%/���������%01������

• ������ �+����� ���� ��� � (���

• ������ �- ����� �"������� ���%

• ���� �� �����%� �#�� ��

Active Working Groups

- 2��3������ �� � �������%3�� � �����

39

40

MORPC RDAC Data Policy Needs Survey & Toolkit WGSurveying and resourcing civic data governance needs

Data topics: e.g. cybersecurity, open data, privacy & ethics, contracting, sourcing and provisioning

Operations: Policy, procedures, staffing and training

DRAFT

Page 21: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

41

MORPC Data User Personas

Guiding data delivery & engagement

Personas span:

• Mid-sized City Planner

• Township Administrator

• Engaged Resident

• Elected Village Council Member

• Nonprofit Employee

• Consultancy Project Manager

• Civic Tech Enthusiast

https://www.morpc.org/tool-resource/data-user-personas/

$40 MILLION

42

DRAFT

Page 22: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

To empower our residents to live their best lives through responsive, innovative and safe mobility solutions.

VISION MISSIONTo demonstrate how an

intelligent transportation system and equitable access to transportation can have

positive impacts on every day challenges

faced by cities.

43

44

DRAFT

Page 23: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

USDOT PORTFOLIO

45

TECHNICAL WORKING GROUP (TWG)

Data

• Identify use cases for data to solve community challenges  & continue to make the city smarter 

Technical

• Architecture & Design Recommendations

• Best Practices 

• Tech Strategies

• Licensing / IP 

Policy

• Data Management

• Data Privacy 

• Policy Roadmap

46

DRAFT

Page 24: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

Data: Privacy, Management and Policies

SMART COLUMBUS DATA PRIVACY AND MANAGEMENT PLANS

47

WHAT IS THE SMART COLUMBUS DMP?

Data Management Plan

• Governs data within Smart Columbus OS

• Addresses metadata management, data size, data acquisition, data access and data use.

https://smart.columbus.gov/programs/smart‐city‐demonstration

48

DRAFT

Page 25: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

DATA INVENTORY

• Description of Data• Data Type• Dataset• Source of the Data• Responsible Party• Collection Approach• Frequency• Needed for Performance

Management

• Period of Collection• Users of the Data• Value of the Data to the

Users• Does it Contain PII• Relevant Data Standards• Access Policies for the Data• Where is the data located?

Data Inventory for each dataset:

49

SMART COLUMBUS DATA PRIVACY PLAN

• High-level plan to protect privacy and secure data

• Privacy controls• Security controls• De-identification• Ongoing governance

• A commitment to respect and to be a good steward of personal information

https://smart.columbus.gov/programs/smart‐city‐demonstration

50

DRAFT

Page 26: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

HOW WERE THEY DEVELOPED?

• Guiding Light Documents

• Iterative Drafts with Special Reviews

• Diverse Drafting Team

https://www.smartcolumbusos.com/share‐your‐data

51

Types of Guiding Documents

Laws, including:• ORC: Personal Information Systems, Public Records• Federal: Privacy Act 1974• International: GDPR

NIST Standards/Guides• Security categorization standards, defining breach impacts

(FIPS 199)• Minimum security requirement specs, risk-based process

control selection (FIPS 200)• Information type to security category mapping (800-60v1)• Protection of PII confidentiality (800-122)• Security and privacy controls (800-53) *Control defining

52

DRAFT

Page 27: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

Guiding documents continued…

Policy Reports/Playbooks

• Open Data Privacy Playbook (Green et al/Berkman)

• De-identification protocol for Open Data (IAPP)

Policy from other cities

• Open Data Release Toolkit – Data SF/San Francisco

• Open Data Risk Assessment - Seattle

• Data Privacy Plan – Connected Vehicle Pilot / Tampa

53

Iterative Plan Development Process

Guiding Documents

Technical 

LegalPrivacy

54

DRAFT

Page 28: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

Drafting Dream Team

• Over two years• 38 people from across 22 organizations• Facilitated by Moritz Program on Data and Governance • Diverse perspectives:

• Public Sector IT• Lawyers• Information Privacy Professionals• Corporate Chief Privacy Officers• Unaffiliated national privacy experts• OSU students

55

Transparency

User Notice/Consent

PII Use Limitations

Data Minimization

Secure Environment

Identity Shielding

Accountability

PRINCIPLES | SMART COLUMBUS DPP

56

DRAFT

Page 29: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

Data Privacy Plan Elements

Principles and

Legal Protections

for

Projects using PII

• Data Stewardship

• Applicable Law Compliance

• PII Definitions

• Administrative/Legal Safeguards

• Scope of plan related demonstration data

57

Data Privacy Plan Elements

Specified

Privacy Controls

1. Notice and Consent

2. Data minimization

3. PII Use/Sharing

4. Data retention

5. Access, correction, deletion

6. Transparency

7. Accountability

8. De-identification

58

DRAFT

Page 30: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

Data Privacy Plan Elements

Specified

Privacy Controls

9. Data Curation, incl. Data Inventory spec.Benefit/Risk Analysis process

10.Privacy Testing

11.Data compartmentalization

12.IRB

13.Privacy and Security Board

14.Independent Evaluator Access

15.Contractor / 3rd party compliance

16.Privacy Impact Assessments

59

Data Privacy Plan Elements

Specified

Security Controls

1. Encryption

2. Physical controls

3. Access controls, with logs

4. Role/ID-based authorization

5. Penetration Testing

6. Secure software development

7. System monitoring

8. Data loss prevention

60

DRAFT

Page 31: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

Data Privacy Plan Elements

Specified

Security Controls

9. Patching, Antivirus, Malware Checks

10.DPP training

11.Accountability and event review• Incident response plan spec.

61

Data Privacy Plan Elements

Privacy Impact

Assessment

• PIA question prompts• Data description, use, retention• Internal and external sharing/disclosure• Data collection notice, consent• Data access, redress, correction• Project security controls

62

DRAFT

Page 32: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

63

Additional Resources

64

• ����4��4������5�����0���������# ���� � ������ �������� ��������� ��

Civic Data Privacy Leaders Network

Contact: Kelsey Finch, Senior Counsel, Future of Privacy Forum. [email protected]

Future of Privacy Forum

• Resources and reports related to civic data privacy and governance

https://fpf.org/2018/10/31/nothing-to-hide-tools-for-talking-and-listening-about-data-privacy-for-integrated-data-systems/

https://fpf.org/2016/04/25/a-visual-guide-to-practical-data-de-identification/DRAFT

Page 33: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

65

• 3��%6���������% ������#��

• 7� ������� ������ ������� � 4��������������4� ����

• ���� ���� � ��

� ������683��� �� ����

MetroLab

66

DRAFT

Page 34: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

Next OS: Clearinghouse for Transportation DataSpeaker: Sanjiv Nanda

67

68

What is Next OS

Who…

What… access transportation services that are cheaper, faster, and safer

Resident and Visitors, Goods and Freight

Transportation Operators & Transportation Service Providers

manage services or assets as efficiently as possible

Planners and Policymakers

make informed decisions to promote economic growth and quality of life

How…

A suite of integrated applications to browse, book, and pay for services

A suite of dashboards with advanced analytics and dynamic control tools

A platform with public and private data that better informs decisions

Next OS: Data Clearinghouse, Suite of Data Driven Tools and Applications

DRAFT

Page 35: Introduction to Big Datathousands hours of annotated data. 16 ... Analytics Regional Data Hub Intelligent Transportation System Architecture Regional Arterials Management System Streetlight

69

Next OS In Operation

Flexible Fleets• Shared Mobility• Microtransit• Micromobility

Mobility Hubs• Trip Planning Kiosks• Curb Access Management

Transit Leap• Real Time Tracking• Dynamic Scheduling

Complete Corridors• Managed Lanes• Signal Priorities• Dynamic Pricing

Next OS: Data Clearinghouse, Suite of Data Driven Tools and Applications

• Operational EfficiencyCoordination, Scheduling and Pricing of Resources and Assets

• Transportation User EngagementTrip Planning and Payments at your fingertips

DRAFT