socializing big data: collaborative opportunities in computer science, the social sciences, and the...

21
Richard Marciano UNC Chapel Hill [email protected] http://salt.unc.edu http://digitalinnovati on.unc.edu "Socializing 'Big Data': Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanities"

Upload: hastacsheryl

Post on 30-Oct-2014

564 views

Category:

Technology


3 download

DESCRIPTION

Harnessing the “data deluge” is promoting new conversations between disciplines. Prof. Marciano and his collaborators have been pursuing research in a number of areas including: big cultural data, access to big heterogeneous data, records in the cloud, federated grid/cloud storage, visual interfaces to large collections, policy-based frameworks to automate content management, and distributed cyberinfrastructure to enable data sharing. But more importantly, innovative technical approaches require the convergence of creative insights across computer science, the social sciences, and the humanities. This talk touches on these topics and highlights a new collaboration with partners at Duke. Richard Marciano is a professor in the School of Information and Library Science at the University of North Carolina at Chapel Hill, Director of the Sustainable Archives and Leveraging Technologies (SALT) lab, and co-director of the Digital Innovation Lab (DIL). He leads development of "big data" projects funded by Mellon, NSF, NARA, NHPRC, IMLS, DHS, NIEHS, and UNC. Recent 2012 grants include a JISC Digging into Data award with UC Berkeley and the U. of Liverpool, called "Integrating Data Mining and Data Management Technologies for Scholarly Inquiry," a Mellon / UNC award called "Carolina Digital Humanities Initiative," which involves the translating of big data challenges into curricular opportunities, and an NSF award on big heterogeneous data integration. He holds a B.S. in Avionics and Electrical Engineering, and an M.S. and Ph.D. in Computer Science, and has worked as a postdoc in Computational Geography. He conducted interdisciplinary research at the San Diego Supercomputer at UC San Diego, working with teams of scholars in sciences, social sciences, and humanities.

TRANSCRIPT

Page 1: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

Richard Marciano

UNC Chapel Hill

[email protected]

http://salt.unc.eduhttp://digitalinnovation.unc.edu

"Socializing 'Big Data': Collaborative Opportunities in

Computer Science, the Social Sciences, and the Humanities"

Page 2: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

Current research Areas

•records in the cloud, •big cultural data, •access to big heterogeneous data, •federated grid/cloud storage, •visual interfaces to large collections, •policy-based frameworks to automate content management, •distributed cyberinfrastructure to enable data sharing.

Page 3: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

Records in the Cloud

Kickoff meeting on Feb. 5, 2013•UBC iSchool, Faculty of Law, School of Bus. •UW iSchool•Mid-Sweden Info. Tech and Media,

Delegating to cloud providers the responsibility for security, accessibility, disposition and preservation.

Page 4: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

• Grids in Context• Larry Smarr

• Computational Grids• Ian Foster and Carl Kesselman

• Distributed Supercomputing Applications• Paul Messina

• Realtime Widely Distributed Instrumentation• William E. Johnston

• Data-Intensive Computing• Reagan Moore, … Richard Marciano, …

• Teleimmersion• Tom DeFanti and Rick Stevens

• Application-Specific Tools• Henri Casanova, Jack Dongarra, …

• Compilers, Languages, and Libraries• Ken Kennedy

• Object-Based Approaches• Dennis Gannon, Andrew Grimshaw

• High-Performance Commodity Computing• Geoffrey Fox, Wojtek Furmanski

• The Globus Toolkit• Ian Foster, Carl Kesselman

• High-Performance Schedulers• Francine Berman

• High-Throughput Resource Management• Miron Livny, Rajesh Raman

• Instrumentation and Measurement• Jeffrey Hollingsworth, Bart Miller

• Performance Analysis and Visualization• Daniel Reed, Randy Ribler

• Security, Accounting, and Assurance• Clifford Neuman

• Computing Platforms• Andrew Chien

• Network Protocols• P.M. Melliar-Smith, Louise Moser

• Network Quality of Service• Roch Guerin, Henning Schultzrinne

• Operating Systems and Network Interfaces• Peter Druschel, Larry Peterson

• Network Infrastructure• Jon Postel, Joe Touch

• Testbeds: Bridges from Research to Infrastructure• Charlie Catlett, John Toole

1998

2003 Tony Hey:“The Data Deluge: An e-Science Perspective”

2004 Collaborative Science

Page 5: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

Big Data is a Big DealWhite House announcement: http://www.whitehouse.gov/blog/2012/03/29/big-data-big-deal

Big Data Across the Federal Government: http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_fact_sheet_final_1.pdf

More then $200M in new commitments (NSF, HHS/NIH, DOE, DOD, DARPA, USGS)

Goal: “improve the ability to extract knowledge and insights from large and complex collections of digital data”.

DataNet Long-term preservation and access of data

Software Infrastructure for Sustained Innovation (SI2) Digging Into Data Challenge (NSF/NEH/IMLS & JISC)

Computational Humanities Cyber-Enabled Discovery and Innovation (CDI)

Data enabled science and engineering Core Techniques and Technologies for Advancing Big

Data Science & Engineering (BIGDATA) Data Infrastructure Building Blocks (DIBBs) DataWay

National Infrastructure for Heterogeneous Data

Page 6: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

DataEdge, UC Berkeley May 31, 2012

Geoffrey Nunberg Panel:

“Something seems to happen, people feel, when you get to that 13th zero, or 15th zero, or 18th zero, or 21st zero, wherever it is, and bingo it’s the petabyte age, it’s the age of big data.

“The question is whether the advent of big data changes the way we do social science and also what role social scientists will play…”

It’s like combing your hair, you just comb, and comb, and comb, and all of a sudden it’s like big hair.”

12/31/2012 Forbes article by Edd Dumbill: “Big Data, Big Hype: Big Deal”“Big data is an imprecise term. As such it’s a huge boon to marketers… not everyone is pleased with the “bigger is better” argument. “Big data” really means “smart use of data”.

“Size Matters: Big Data, New Vistas in the Humanities and Social Sciences”:

Page 7: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

“Personalization” is another word for discrimination. We’re not discriminating if we tailor things to you based on what we know about you — right? That’s just better service.

Allistair Croll: “Big Data is our Generation’s civil rights issue, an we don’t know it.”

Publicly available last name information can be used to generate racial boundary maps.

From the Mapping London project

When bank managers tried to restrict loans to residents of certain areas (known as redlining) Congress stepped in to stop it (with the Fair Housing Act of 1968). They were able to legislate against discrimination, making it illegal to change loan policy based on someone’s race.

Home Owners’ Loan Corporation map showing redlining of “hazardous” districts in 1936. see: DURHAM MAPS for T-RACES –project

Music selection and sharing with friends could allow to guess a person’s racial background and deny a loan.

Page 8: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

Big Data = Big Collaborations

Page 9: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno
Page 10: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

May 2007

Socializing CI:Networking the Humanities,

Arts, and Social Sciences

Page 11: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

• Project Lead: Richard Marciano (UNC/SALT)

• Project Manager: Amy Shoop (UNC ITS)

• Oversight Council– CIOs -- Head Librarians

• Tracy Futhey -- Duke CIO Deborah Jakubs -- Duke Librarian

• Marc Hoit – NCSU CIO Susan Nutter – NCSU Librarian

• Larry Conrad – UNC CIO Sara Michalak – UNC Librarian

– RENCI• Alan Blatecky -- RENCI Stan Ahalt -- RENCI

– DICE Center• Reagan Moore – DICE

– SALT Lab• Richard Marciano -- SALT

TUCASI data-Infrastructure Project (TIP)Managing Digital Research Data in Federated Storage

Clouds

TUCASI data-Infrastructure Project (TIP)Managing Digital Research Data in Federated Storage

Clouds

Page 12: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

Focus Group Membership

University Teams

Focus Groups

Duke Chapel Hill NC State

Classroom Capture

Samantha Earp (CC lead) (OIT-Academic Services)

Suzanne Cadwell (ITS-Academic Outreach & Engagement)

Charlie Greene (ITS-Teaching & Learning)

Pam Sessoms (Lib-e-Reference)

Lou Harrison (DELTA)Hal Meeks (OIT-Outreach,

Communications and Consulting)

Storage

Amy Brooks (OIT-Systems)

Klara Jelinkova (OIT-Shared Services & Infrastructure)

David Kennedy (Lib-Info. Sys. Support)

Molly Tamarkin (Lib-Systems)

Jim Tuttle (Lib-Systems)

Reagan Moore (S lead) (DICE)Leesa Brieger (RENCI-Data)Brent Caison (ITS-Storage)Dave Pcolar (Lib-Systems)Bill Schulz (Lib-Systems)Lisa Stillwell (RENCI-Data)

Steve Morris (Lib-Systems)Eric Sills (OIT-Research

Computing)

Future Data & Policy

Paolo Mangiafico (Provost-Dig. Info. Strategy)

Tim Pyatt (Lib-Archives)

Ruth Marinshaw (ITS-Research Computing)

Will Owen (Lib-Systems)Rich Szary (Lib-Special

Collections)

Kristin Antelman (FD&P lead) (Lib)

Susan Nutter (Lib-Head Librarian)

Page 13: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

30 funded57 total

Page 14: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

“Public Scholarship”Kathy Woodward, UW Simpson Center for the Humanities

• University of North Carolina Asheville (UNCA): staff (provost, head librarian, head of special collections, library staff, departments of computer science / history / political science), centers (National Environmental Modeling and Analysis Center / Center for Diversity Education), and students

• community-based development organizations (Green Opportunities Corps, Asheville Design Center)

• neighborhood community group leaders and residents (Southside, Burton Street, East End)• city of Asheville officials (Housing Authority of the City of Asheville, Planning &

Development Department, West Asheville Public Library, Chamber of Commerce)• county (head of Buncombe County Register of Deeds, Land-Of-Sky Regional Council)• other groups including the North Carolina Humanities Council, Mountain Housing

Opportunities Inc.• “Twilight of a Neighborhood: Asheville’s East End, 1970” project. This project examined

the process and aftermath of urban renewal and collected voices of residents, after the 2007 transfer of records to UNC Asheville. We have secured support and commitment from the community groups relevant to tackling this project.

• Asheville’s African-American Community Historical Bus Tour, June 19, 2012 (35 people)

UNC, Duke, Asheville collaboration

Page 15: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno
Page 16: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno
Page 17: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno
Page 18: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

UNCA & Asheville Partners:• Dwight Mullen, UNCA Political Science• Priscilla Ndiaye, chair of Asheville's Southside Advisory Committee

Page 19: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

Big Heterogeneous Data (with Duke)

Researching the cyberinfrastructure implications of supporting large scale content based indexing of highly heterogeneous digital collections potentially embodying non-uniform or sparse metadata architectures…

Intellectual Merit:Demonstrating the creation of national collections through automation and citizen-scientist crowdsourcing efforts is the focus of this task. Broader Impacts:This case-study will bring heterogeneous content from a variety of sources: census, economic, historic, planning, insurance, financial, and scientific.  Outcomes: Worfklows & Visual prototype

Mapping historical residential segregation in the US

Page 20: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

From Crowdsourcing to Citizen-led Sourcing

• Neighborhood community group leaders and residents (Southside, Burton Street, East End)

• University of North Carolina Asheville staff (provost, head of special collections, library staff, departments of computer science / history / political science), centers (Renaissance Computing Institute / Center for Diversity Education), and students

• Community-based development organizations (Green Opportunities Corps, Asheville Design Center)

• City of Asheville officials (Housing Authority of the City of Asheville, Register of Deeds, GIS, Planning & Development Department, , West Asheville Public Library, Chamber of Commerce, Regional Council)

• Other groups including the North Carolina Humanities Council, Mountain Housing Opportunities Inc., Twilight of a Neighborhood.

Page 21: Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

Vectors – Annenberg Center for Communication SDSC: SALT

Policy

Content

Governance

Infrastructure

Evolution

“We define the ‘discipline of data curation’ as the practice of collection, annotation, conditioning, and preservation of data for

both current and future use” – Helen Tibbo & Bryan Heidorn

SALTSALT

annotation

conditioning

preservation

collection

current & future use