data literacy
DESCRIPTION
INDIAN STATISTICAL INSTITUTE Documentation Research & Training Centre 8th Mile, Mysore Road, RVCE Post Bangalore-560 059 DRTC Seminar- 5 2014 Data Literacy ABSTRACT In our increasingly data-driven society, data literacy is an important civic skill which we should be developing in our society. Data is slowly but steadily forcing their way into the societies. Data literacy may seem less technical than either Computer Science or any other fields. Still we need to envisage a wide variety of tools for accessing, converting and manipulating data. These require to understand relational databases (like MS Access), data manipulation techniques, statistical software tools (like Minitab, SPSS, STATA and MS Excel) and data representation software tools (like MS PowerPoint and MS Excel). This seminar includes an introduction on data literacy, its inter-relationship with information literacy and statistical literacy. It also includes various steps for working with data followed by short demonstration of data analysis techniques by using the software STATA11. Speaker: Jayanta Kr. Nayek Date:29 .10.2014. Time: 2 p.m. Venue: DRTC, ISI Bangalore. All are cordially invited. Seminar Coordinator Biswanath DuttaTRANSCRIPT
5th Seminar on
Data Literacy
Jayanta Kr. NayekDRTC,ISIBC
3rd Semester2013-2015
Contents• Introduction• What is data?• Data Life Cycle• Definitions of DL, IL & SL• Relation between DL, IL &SL• Conceptions of Data Literacy• Why data literacy?• Basic steps for working with Data
– Data Visualization– Data Interpretation– Data Documentation– Data Transformation
Data visualization and wrangling tools Becoming data literate: Basic Skills of Data Literacy Data Literacy in Libraries Big Data magagement SAS as a solution for Big data• Conclusion• References
22
Introduction
• The evaluation of information is a key element in information literacy, statistical literacy and data literacy. As such, all three literacies are inter-related. It is difficult to promote information literacy or data literacy without promoting statistical literacy.
• All librarians are interested in information literacy; archivists and data librarians are interested in data literacy. Both should both consider teaching statistical literacy as a service to users who need to critically evaluate information in arguments.
3
What is Data?
• Webster meaning:
“facts or information used usually to calculate, analyze, or plan something”.
• Anything is data – text, image, numbers, …
• For computer to understand, data needs to be in structured and machine-readable form
4
Data Life Cycle
Data Literacy
“Data-literacy is the ability to consume for knowledge, produce coherently and think critically about data.”
“Data literacy” includes :
• Information Literacy• Statistical Literacy • Understanding how to work with large data sets,• How they were produced, • How to connect various data sets, and • How to interpret them.
6
Information Literacy
Information Literacy• An information literate individual is able to:
(1) Determine the extent of information needed,
(2) Access the needed information effectively and efficiently,
(3) Evaluate information and its sources critically,
(4) Incorporate selected information into one’s knowledge base,
(5) Use information effectively to accomplish a specific purpose, and
(6) Understand the economic, legal, and social issues surrounding the use of information, and access and use information ethically and legally.
8
Statistical Literacy
• Statistical literacy studies the use of statistics as evidence in arguments (Schield, 1998, 1999).
• A key element of statistical literacy is assembly: how the statistics are defined, selected and presented.
• A second key element of statistical literacy is the importance of context and confounding.
9
Relationship b/w IL, DL & SL
Discussion
According to Schield (2004), data literacy is the part of statistical literacy that involves training individuals to access, assess, manipulate, summarize and present data, whereas statistical literacy aims to teach how to “think critically about descriptive statistics.”
Data literacy as a complement to or a form of information literacy which makes us think that data literacy would be the umbrella concept covering statistical literacy.
Statistical literacy is envisaged as the component of data literacy involved in the critical appraisal, interpretation, processing, and statistical analysis of data.
Data literacy can be defined as the component of information literacy that enables individuals to access, interpret, critically assess, manage, handle and ethically use data.
Conceptions of Data Literacy(1)
A social science perspective :
Data literacy almost synonymous with statistical literacy,quantitative literacy and numeracy – but involving more than basic statistics and mathematical functions
understanding data and its tabular and graphical representations, including statistical concepts and terms
finding, evaluating and using statistical information effectively and ethically as evidence for social inquiries
reading, interpreting and thinking critically about stats
Conceptions of data literacy(2)
Conceptions of Data Literacy(3)
A science (STEM/information science) perspective :
Science data literacy shares aspects of social science conceptions, but requires awareness of the data life cycle, metadata issues, data tools and collaboration mechanisms
managing the data generated from experiments, surveys and observations by using sensors and other devices
understanding the attributes, quality and history of data to produce valid, reliable answers to scientific inquiries
accessing, collecting, processing, manipulating, converting, transforming, evaluating and using data
Why data literacy?
• Slowly but steadily data are forcing their way into every nook and cranny of the industry, company and job.
• Data literacy is the ability to ask and answer meaningful questions by collecting, analyzing and making sense of data encountered in our everyday lives.
• In our increasingly data-driven society, data literacy is an important civic skill which we should be developing in our society.
16
Basic Steps in Working with Data
There are at least three key concepts we need to understand when starting a data project:
• Data requests should begin with a list of questions you want to answer.
• Data often is messy and needs to be cleaned.
• Data may have undocumented features.
17
Data Visualization
Visualization provides a unique perspective on the dataset.
Visualization is critical to data analysis. It provides a front line of attack, revealing intricate structure in data that cannot be absorbed in any other way. We discover unimagined effects, and we challenge imagined ones.
--William S. Cleveland: visualizing Data
Data insights: a visualization (Gregor Aisch)
19
How to visualize Data
Tables It is very powerful when you are dealing with a relatively small number of data points.
Charts It allow you to map dimensions in your data to visual properties of geometric shapes.
Maps The power of map is to re-connect the data to our very physical world.
Graphs It is all about showing the inter-connections (edges) in your data points (nodes).
Analyze and Interpret
Once you have visualized your data, the next step is to learn something from the picture we created. You could ask yourself:
• What can I see in this image? Is it what I expected?• Are there any interesting patterns?• What does this mean in the context of the data?
• Sometimes you might end up with visualization that, in spite of its beauty, might seem to tell you nothing of interest about your data. But there is almost always something that you can learn from any visualization, however trivial.
21
Document Your Insights and Steps
Documentation is the most important step of the process; and it is also the one we’re most likely to tend to skip.
Help yourself
Help others
22
Transform data
Data transformation is the process of converting data or information from one format to another, usually from the format of a source system into the required format of a new destination system.
Data transformation can be divided into two steps:
data mapping maps data elements from the source data system to
the destination data system and captures any transformation that must occur
code generation that creates the actual transformation program
Data visualization and wrangling tools
Names Examples
Spreadsheets LibreOffice, Excel or Google Docs.
Statistical programming frameworks
R (r-project.org) or Pandas (pandas.pydata.org),STATA,SPSS etc.
Geographic Information Systems (GIS)
Quantum GIS, ArcGIS, GRASS
visualization Libraries d3.js (mbostock.github.com/d3), Prefuse (prefuse.org), Flare (flare.prefuse.org)
Data Wrangling Tools Google Refine, Datawrangler
Non-Programming Visualization Software
ManyEyes, Tableau Public (tableausoftware.com/products/public)
Basic Skills of Data Literacy
• “Learning key statistical terms, like the difference between mean and median; or why a standard deviation or margin of error might matter.
• “Knowing what questions to ask about data or a statistic to gauge its potential relevance, quality or reliability.
• “Performing basic statistical calculations -- nothing fancy, just enough to do a quick reality-check whether you're understanding the story that a dataset might be telling.
• “Putting data in context, such as considering the local unemployment rate in the context of Census data for your community, or local vs. state/national crime statistics.”
28
Data Literacy in Libraries’Instructional Programs and Services
Academic libraries are deploying a four-fold response for the growing need to use research data:
1) hiring specialized staff (data librarians or data specialists) or furthering data management and analysis training for (generally reference) librarians;
2) intensifying the collection or compilation of and providing access to data sources;
3) participating in the development of institutional data repositories to preserve and share original research data and
4) incorporating data literacy in their instructional programs and services (whose design should follow today’s inescapable reference framework, namely the ACRL (2011b).
The ACRL (2011b) recommendations on information literacy, as well as its “Guidelines for Instruction Programs in Academic Libraries.”
via the Web, with the publication of self-training resources (open to the public or for in-house use only);
in the library itself, through reference service, one-on-one and on-demand or scheduled user training sessions;
through face-to-face and online instruction, forming part of credit courses, either as specialized stand alone instruction or, with instructors’ cooperation, as instruction embedded in other subjects.
List of libraries provided Data literacy
The Massachusetts Institute of Technology’s (MIT) Data Management and Publishing tutorial,
The EDINA Research Data Management Training (MANTRA),
The University of Edinburgh’s Data Library and
The University of Minnesota libraries’ Data Management Course for Structural Engineers.
Data Library A data library is a collection of numeric and/or geospatial data sets for
secondary use in research.
A data library is normally part of a larger institution (academic, corporate, scientific, medical, governmental, etc.) established to serve the data users of that organisation.
The data library tends to house local data collections and provides access to them through various means (CD-/DVD-ROMs or central server for download).
A data library may also maintain subscriptions to licensed data resources for its users to access.
Data libraries & Data librarians services
Reference Assistance User Instruction Technical Assistance Collection Development & Management Preservation and Data Sharing Services
Big Data what it means
Handling the Big Data
Three key technologies that can help you get a handle on big data – and even more importantly, extract meaningful business value from it.
• Information management for big data.
• High-performance analytics for big data.
• Flexible deployment options for big data.
1. SAS Information Management
Unified data management capabilities
including data governance, data integration, data quality and metadata management.
Complete analytics management
including model management, model deployment, monitoring and governance of the analytics information asset.
Effective decision management
capabilities to easily embed information and analytical results directly into business processes while managing the necessary business rules, workflow and event logic
2. High-performance AnalyticsGrid Computing A centrally managed grid infrastructure provides dynamic
workload balancing, high availability and parallel processing for data management, analytics and reporting.
In-database processing Using the scalable architecture, in-database processing reduces the time needed to prepare data and build, deploy and update analytical models.
In-memory analytics Quickly create and deploy analytical models. Solve dedicated, industry-specific business challenges byProcessing detailed data in-memory within a distributed environment, rather than on a disk.
Support for Hadoop With SAS Information Management, you can effectively manage data and processing in the Hadoop environment (which stores and processes large volumes of data on commodity hardware).
3. Flexible Deployment
For some organizations, it won’t make sense to build the IT infrastructure to support big data, especially if data demands are highly variable or unpredictable.
Those organizations can benefit from cloud computing, where big data analytics is delivered as a service and IT resources can be quickly adjusted to meet changing business demands.
Conclusion Both data literacy and information literacy should be expanded to include critical thinking and statistical literacy.
Expanding data literacy to include statistical literacy will help to deal with inferring causation from associations (in social science).
Expanding information literacy to include statistical literacy will help to deal with information that involves statistics.
As such, including statistical literacy with information literacy and with data literacy will provide more opportunities for librarians to be of service in helping users think critically.
ReferencesAssociation of College and Research Libraries (ACRL). 2011a. “Information Literacy Competency Standards
for Journalism Stu-dents and Professionals.” Accessed September 24, 2014.http://www.ala.org/acrl/sites/ala.org.acrl/files/content/standards/il_journalism.pdf
Information Litearcy Statistical Literacy and Data Litearcy.2004.Acccessed October 21,2014.http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=2CCC9BFA3C0B3E0F05CABBD154E1DA15?doi=10.1.1.144.6309&rep=rep1&type=pdf
International Association for Social Science Information Services and Technology (IASSIST). Available at: www.iassistdata.org and http://datalib.library.ualberta.ca/
Linden, Julie (2002).Finding, Evaluating and Using Numeric Data. Presented at IASSIST 2002 conference, Storrs, Connecticut. Available at: http://ropercenter.uconn.edu/iassist2002/program.html
www.datajournalismhandbook.org
www.datalib.edina.ac.uk/Mantra/
www.ed.ac.uk
www.knightdigitalmediacenter.org
www.sas.com/resources/whitepaper/wp_46345.pdf
Www.dataone.org