ethical and legal issues in computational social science - lecture 7 in introduction to...

37
ETHICAL & LEGAL ISSUES IN COMPUTATIONAL SOCIAL SCIENCE LECTURE 7, 4.5.2015 INTRODUCTION TO COMPUTATIONAL SOCIAL SCIENCE (CSS01) LAURI ELORANTA

Upload: lauri-eloranta

Post on 25-Jul-2015

324 views

Category:

Data & Analytics


5 download

TRANSCRIPT

Page 1: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

ETHICAL & LEGAL ISSUES IN COMPUTATIONAL SOCIAL SCIENCE

LECTURE 7, 4.5.2015 INTRODUCTION TO COMPUTATIONAL SOCIAL SCIENCE (CSS01)

LAURI ELORANTA

Page 2: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• 16.3.2015 – Lecture 1, U35 room 114, Introduction to CSS [DONE]• 23.3.2015 – Lecture 2, U35 room 114, Basics of Computation and Modeling [DONE]• 30.3.2015 – Lecture 3, U35 room 114, Big Data & Data Mining [DONE]• 6.4.2015 – No lecture – Easter Monday [DONE]• 13.4.2015 – Lecture 4, U35 room 114, Social Network Analysis [DONE]• 20.4.2015 – Lecture 5, Snellmaninkatu 10, room1, Complex Social Systems [DONE]• 27.4.2015 – Lecture 6, U35 room 114, Simulation in Social Sciences [TODAY]• 4.5.2015 – Lecture 7, U35 room 114, Ethics and Legal Issues in CSS [TODAY]• 11.5.2015 – Lecture 8, U35 room 114, Summary and Retrospective

LECTURESSCHEDULE

Page 3: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• PART 1: BIG DATA IS PROBLEMATIC• Ethics• Access• Privacy• PART 2: LEGAL ISSUES IN CSS

LECTURE 7OVERVIEW

Page 4: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

BIG DATA & BIG PROBLEMS

Page 5: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Big promises vs. Big challenges• The research subjects are humans• The massive amounts of data are gathered from human based

interactions• This underlines challenges in: • Research Ethics

• Privacy• Transparency• Trust

• Research Method: how to design and conduct resrach in an ethical manner?

• Access to data• Who owns the research data?• Do you have access to research data?

• Purpose of Research (agenda)

COMPUTATIONAL SOCIAL SCIENCE IS PROBLEMATIC

Page 6: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

King, G. 2011. Ensuring the Data-Rich Future of the Social Sciences. Science. 11 February 2011: Vol. 331 no. 6018 pp. 719-721.

Page 7: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

Democratic Society

Understanding of

Knowledge

Privacy(vs.

surveillance)

Big Data

Page 8: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

1. Big data changes the definition of knowledge2. Claims to big data objectivity and accuracy are misleading3. Bigger data are not always better data4. Taken out of context big data loses its meaning5. Accessibility does not make big data research ethical6. Limited access to big data creates new digital divides

CRITICAL QUESTIONS FOR BIG DATA (BOYD & CRAWFORD 2012)

Page 9: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• “Big Data has emerged a system of knowledge that is already changing the objects of knowledge, while also having the power to inform how we understand human networks and community. ‘Change the instruments, and you will change the entire social theory that goes with them’, Latour (2009) reminds us.” (Boyd & Crawford 2012)

• “Rather, it is a profound change at the levels of epistemology and ethics. Big Data reframes key questions about the constitution of knowledge, the processes of research, how we should engage with information, and the nature and the categorization of reality.”• Do numbers really speak for themselves?• The inherent bias of the tools an technologies!

1. BIG DATA CHANGES THE DEFINITION OF KNOWLEDGE

(Boyd & Crawford 2012)

Page 10: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• “In reality, working with Big Data is still subjective, and what it quantifies does not necessarily have a closer claim on objective truth”• There’s a risk that big data widens the division between “subjective”

qualitative research and “objective” quantitative research• Processing and analyzing big data contains quite many

subjective steps that sometimes are not recognized subjective• How data is cleaned• What methods of analysis are used and how• How results are interpreted

• The reliability of data sets?• Errors in data sets• Transparency on how the data set is collected is typically very limited!• Biases and limitations of data set

2. BIG DATA IS NOT THAT OBJECTIVE

(Boyd & Crawford 2012)

Page 11: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Just because big data presents us with large quantities of data does not mean that methodological issues are no longer relevant. Understanding sample, for example, is more important now than ever. • Validity• Reliability• Fit for research question?

• Good example of sample limitations and bias is Twitter data• Does not represent “all people” even though millions of

people might be included in the data set• No visibility on the sample selection of the data set• Size does not equal representability• Restricted access to Twitter firehose, garden hose etc…

3. BIGGER DATA ARE NOT ALWAYS BETTER DATA

(Boyd & Crawford 2012)

Page 12: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Data related tools and methods might not be transferable from context to context• E.g. Facebook graph might mean something in Facebook, but

it is hardly the full representation of the persons real life social network• Activity and intensity in social media context might not have

the same meaning in real life

• Big data is not generic data about social interactions in general, but specific to the source it is collected from

4. TAKEN OUT OF CONTEXT, BIG DATA LOSES ITS MEANING

(Boyd & Crawford 2012)

Page 13: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• “[W]hat is the status of so-called ‘public’ data on social media sites? Can it simply be used, without requesting permission? What constitutes best ethical practice for researchers? Privacy campaigners already see this as a key battleground where better privacy protections are needed. The difficulty is that privacy breaches are hard to make specific – is there damage done at the time? What about 20 years hence? ‘Any data on human subjects inevitably raise privacy issues, and the real risks of abuse of such data are difficult to quantify’ (Nature, cited in Berry 2011).”

• Open access to data does not mean that the research is automatically ethical.

• Understanding of processes of mining and anonymizing Big Data are typically limited: true accountability requires critical thinking even in cases where some ethical board have granted access for research

• Significant questions in relation to control and power: researchers have the tools and the access, while social media users as a whole do not.

5. JUST BECAUSE IT IS ACCESSIBLE DOES NOT MAKE IT ETHICAL

(Boyd & Crawford 2012)

Page 14: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• “But who gets access? For what purposes? In what contexts? And with what constraints? While the explosion of research using data sets from social media sources would suggest that access is straightforward, it is anything but. “

• Only Social Media companies have full access to data, an average scholar does not.

• Access to data typically costs creates uneven opportunities for research• Top tier universities are in better position

• Skills required for accessing data are restricted to those with computational background• This can be also seen as a gendered division

• Limited access creates a huge bias in relation to the questions asked• Who get’s to decide the purposes big data is used

6. LIMITED ACCESS TO BIG DATA CREATES NEW DIGITAL DIVIDES

(Boyd & Crawford 2012)

Page 15: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Current ethical protocols are not adequate for the types of digital social research increasingly being conducted.• Information generated by users of social media platforms and services cannot be considered equivalent to

conventional types of offline information collected by social researchers.

• Challenges according Neuhaus & Webmoor (2012):1. Change in the enactment of the participant and researcher relationship

(computer mediated setting where this relationship is mediated)

2. Number of individuals in one research data set has sky rocketed, but so has the privacy / accountability risks

3. Problems of identity in relation to research “participants” and “research data”. What roles do these actors actually play.

4. Collected data may reveal user’s identities after remixing with other data points, even when the original research dataset was anonymIzed

5. Peer reviews and accountability might be at stake because nowadays a single researcher has access to millions and millions of data points previously accessible only by teams of researchers.

BIG DATA RESEARCH ETHICS CHALLENGES

(Neuhaus & Webmoor 2012)

Page 16: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Neuhaus and Webmoor (2012) propose agile ethics for big data research:

• Researchers and institutions should accept the fact that this kind of large-scale data mining still involves human subjects. • Logging of research activities and big data collection

• As contract between researchers and participants is not possible, we need to place data generation on more of an equal footing with final outputs; to think of it in terms of authorship. • Taking responsibility of the data sets

• Agile ethics is more an attitude, or a mode of engagement and sensibility for good practice, as opposed to a formal list of procedures and protocols• Flexibility is integral to agile research: considering case by case

• An agile ethics makes the counterintuitive move to increased openness and transparency; to expose our-selves equally with those wrapped up in our projects.

AGILE ETHICS IN BIG DATA RESEARCH

(Neuhaus & Webmoor 2012)

Page 17: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• The power is inherently relational between the following stakeholders:

• Big Data Collectors: decide which data is collected, stored and for how long. Deciding who gets access.

• Big Data Utilizers: uses and redefines the use of data. Can be both collector & utilizer. Determining new behaviour by imposing new social rules of manipulating social processes.

• Big Data Generators: • Natural actors, that generate massive amounts of new data voluntarily,

unvoluntarily, knowingly, unknowingly…• Artificial actors• Physical phenomena

• In this power network ethical decision making is no longer a agency based activity but relational network based ethics

NEW POWER DISTRIBUTION & NETWORKED ETHICS

(Zwitter 2014)

Page 18: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• “Big data poses big privacy risks. The harvesting of large sets of personal data and the use of state of the art analytics implicate growing privacy concerns. Protecting privacy will become harder as information is multiplied and shared ever more widely among multiple parties around the world.“ (Tene & Polonetsky 2014)

• Big data threatens privacy and democracy• Incremental Effect: the growing potential of user identification with more

and more data• Automated decision making based on data and questions of

discrimination and the narrowing of choice• Predictive analysis based on sensitive individual information• Lack of access and exclusion: only a few benefit from big data and have

access to in vast amounts• Problems with research ethics• Chilling effects of the surveillance society as people change their behaviour

based on the notion of 24/7 monitoring

BIG CONCERNS ON PRIVACY

(Tene & Polonetsky 2014)

Page 19: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Key thing to consider in any computational social science study is how to protect the privacy of individuals and groups that are research subjects

• Research data needs to be made anonymized in some way• Unfortunately this can be quite hard, as in data sets with

many data points the data can be connected to the individual, even from anonymous data

• Also critical issue is group privacy in the sense, that although the individual level data might be non-personal, the group level aggregated data might reveal something “private” from the group

PROTECTING THE PRIVACY OF THE RESEARCH SUBJECT

(Zwitter 2014)

Page 20: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

PRISONERS ARCHITECTURE FOR HANDLING RESEARCH SUBJECT PRIVACY(HUTTON & HENDERSON 2012)

Page 21: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

LEGAL ISSUES IN COMPUTATIONAL SOCIAL SCIENCE

Page 22: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

1. Legalities and rights concerning the normal use of software, services and data: What have the research subjects agreed on?

2. Legalities and rights concerning the research use of software, services and data: What is allowed for research and what have you agreed on as a researcher?

3. Legalities and rights concerning the distribution of your own work (code + data): How can I distribute this in a way that it benefits the society the most?

THREE LEGAL AREAS TO UNDERSTAND

Page 23: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Database and software are typically protected by copyrights (or similar rights) and their usage are regulated via database and software licenses.

• Protection for databases vary from country to country. European Union has a special database rights that protect each database for 15 years.

• For normal copyright this is the lifetime of the author +70 years. This applies to all software.

• In order to use the database or software, a license for the use is needed:• Agree with the terms of service• Agree with the license

RIGHTS & LICENSES

Page 24: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• EULA: End user license agreement. Typically in distributed and installed software and apps. Include also asking permissions for end user data collection and processing. (Wikipedia 2015, End-user license agreement)

• Terms of service: “The Terms-of-Service Agreement, is mainly used for legal purposes, by websites and internet service providers, that store a user's personal data, such as e-commerce and social networking services. A legitimate terms-of-service agreement, is legally binding, and may be subject to change.” (Wikipedia 2015, Terms of service)

END USER AGREEMENTS

Page 25: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• User rights and responsibilities• Proper or expected usage; potential misuse• Accountability for online actions, behavior, and conduct• Privacy policy outlining the use of personal data• Payment details such as membership or subscription fees, etc.• Opt-out policy describing procedure for account termination, if

available• Disclaimer/Limitation of Liability clarifying the site's legal

liability for damages incurred by users• User notification upon modification of terms, if offered

ITEMS IN A TYPICAL TERMS OF SERVICE

(Wikipedia 2015, Terms of service)

Page 26: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

CASE INSTAGRAM

(Image from: http://www.thevine.com.au/life/tech/instagram-your-photos-of-cats-are-worth-money-updated-20121219-243481/)

Page 27: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Terms of service also govern what one is able to do with the service as a users.• In many cases a researchers is a user in this respect: thus

terms of service may define what and how one is able to research• E.g. Is web-scraping allowed?• E.g. How much information the user is able to get via an API

• As researcher needs to agree with the terms of service to conduct the research, there might be legal consequences if service terms are breached• Highly important to read and understand the legal

agreements in relation to one’s research

TERMS OF SERVICE GOVERN ALSO USE OF DATA (& RESEARCH)

Page 28: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• When using open source software and/or sharing your code, it is important to understand under which software license this is done

• There are differences between different open source licenses with big implications (in general all allow license cost free modification, copying and distribution).

• Two major types of open source software licenses:1. Permissive free software licenses2. Copyleft licenses

• In addition there is the Creative Commons (CC) license family, which is more general and extends to many other areas than software. Open Databases are typically licensed under CC, or CC0 public domain.

• The Open Knowledge Foundation is also promoting Open Database License (ODbL)

OPEN SOURCE LICENSES

Page 29: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Give rights to use, modify and distribute the software and do not limit the potential further use of the software.• Permissive: The further distribution of the software may or

may not be free of charge• Gives permissions to do anything freely

• Typically requires crediting the original authors

• Can be seen as “the academic” license. Most well know versions are from MIT and Berkeley licenses• MIT License• BSD License

PERMISSIVE OPEN SOURCE LICENSES

Page 30: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Copyleft is the practice of offering people the right to freely distribute copies and modified versions of a work with the stipulation that the same rights be preserved in derivative works down the line. (Wikipedia, Copyleft)

• Software done based on copyleft software is automatically under copyleft license. (It can be seen as contagious in this sense)

• Most well known copyleft licenses are GNU GPL and its versions

COPYLEFT OPEN SOURCE LICENSES

Page 31: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• “Works in the public domain are those whose intellectual property rights have expired, have been forfeited, or are inapplicable. Examples include the works of Shakespeare and Beethoven, most of the early silent films, the formulae of Newtonian physics, Serpent encryption algorithm and powered flight.” (Wikipedia 2015, Public Domain)

• Getting things to public domain can be quite hard: some countries may even prohibit any attempt by copyright owners to surrender rights automatically conferred by law.

• An alternative way: issue a license which irrevocably grants as many rights as possible to the general public. CC0 license from Creative Commons

PUBLIC DOMAIN

(Wikipedia 2015, Public Domain)

Page 32: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• “The Data Protection Directive (officially Directive 95/46/EC on the protection of individuals with regard to the processing of personal data and on the free movement of such data) is a European Union directive adopted in 1995 which regulates the processing of personal data within the European Union. It is an important component of EU privacy and human rights law. On 25 January 2012, the European Commission unveiled a draft European General Data Protection Regulation that will supersede the Data Protection Directive.” (Wikipedia 2015, Data Protection Directive)

• Governs the processing and transfer of personal data• Introduced the right to be forgotten• The U.S. has no single data protection law, and legislation is on ad hoc

basis

EUROPEAN DATA PROTECTION DIRECTIVE

Page 33: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Read Instagram’s latest Terms of Use, Privacy Policy and API Terms of Use:https://instagram.com/about/legal/terms/

• What implications does the terms have in relation to potential research that uses Instagram pictures as research data?

LECTURE ASSIGNMENT 1

Page 34: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Watch the following videos on big data & privacy:• https://www.youtube.com/watch?v=H_pqhMO3ZSY

• Read the following articles on ethics, surveillance and big data:• Zwitter, A. (2014). Big Data ethics. Big Data & Society, 1(2),

2053951714559253.• Lyon, D. (2014). Surveillance, Snowden, and big data:

capacities, consequences, critique. Big Data & Society, 1(2), 2053951714541861.

LECTURE ASSIGNMENT 2

Page 35: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, 15(5), 662-679.• Zwitter, A. (2014). Big Data ethics. Big Data & Society, 1(2),

2053951714559253.• Richards, N. M., & King, J. H. (2014). Big data ethics. Wake

Forest Law Review.• Neuhaus, F., & Webmoor, T. (2012). Agile ethics for massified

research and visualization. Information, Communication & Society, 15(1), 43-65.• Lyon, D. (2014). Surveillance, snowden, and big data:

capacities, consequences, critique. Big Data & Society, 1(2), 2053951714541861.

LECTURE 7 READING

Page 36: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

• Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, 15(5), 662-679.

• Zwitter, A. (2014). Big Data ethics. Big Data & Society, 1(2), 2053951714559253.

• Richards, N. M., & King, J. H. (2014). Big data ethics. Wake Forest Law Review.• Bollier, D., & Firestone, C. M. (2010). The promise and peril of big data (p. 56).

Washington, DC, USA: Aspen Institute, Communications and Society Program.• Tene, O., & Polonetsky, J. (2012). Big data for all: Privacy and user control in

the age of analytics. Nw. J. Tech. & Intell. Prop., 11, xxvii.• Neuhaus, F., & Webmoor, T. (2012). Agile ethics for massified research and

visualization. Information, Communication & Society, 15(1), 43-65.• Lyon, D. (2014). Surveillance, snowden, and big data: capacities,

consequences, critique. Big Data & Society, 1(2), 2053951714541861.• Hutton, L., & Henderson, T. (2013). An architecture for ethical and privacy-

sensitive social network experiments. ACM SIGMETRICS Performance Evaluation Review, 40(4), 90-95.

REFERENCES

Page 37: Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

Thank You!

Questions and comments?

twitter: @laurieloranta