social media & big data - voson)...

Download Social Media & Big Data - VOSON) Labvosonlab.net/papers/ACSPRIWinter2015/Lecture_SocialMedia_BigDat… · Social Media & Big Data ... Plan of lecture ... designed to provide regular

If you can't read please download the document

Upload: buiquynh

Post on 06-Feb-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • Social Media &Big Data

    Robert AcklandAustralian Demographic and

    Social Research Institute (ADSRI)Australian National [email protected]

    http://voson.anu.edu.au

    Notes prepared for Big Data Analysis for Social Scientists courseACSPRI Winter Program, Brisbane, 29 June 3 July 2015

  • 2

    Plan of lecture

    Examples of computer-mediated interaction Online research methods Mapping Cyberspace Construct validity of virtual world data Big Data

  • 3

    Examples of computer-mediated interaction

  • 4

    Newsgroups - Repositories of emails set up for different topics, often hosted on Usenet.

  • 5

    Wikis

    "A wiki is a collection of web pages designed to enable anyone who accesses it to contribute or modify content, using a simplified markup language." (http://en.wikipedia.org/wiki/Wiki)

  • 6

    Folksonomy - A website enabling collaborative creation and managing of tags to annotate and categorise content (also known as social classification / tagging).

  • 7

    Blog - a chronologically updated website, typically written by a single author and designed to provide regular commentary on particular topics or else to serve as online diary.

  • 8

    Social network services - websites that allow people to create personal profiles and interact by requesting and accepting "friendships" and joining groups/forums.

  • 9

    Virtual world - computer-based simulated environments where individuals can assume digital representations (avatars) and interact.

  • 10

    MUD (British Legends) started in 1978 - said to be oldest virtual world in existence(http://www.british-legends.com/history.htm)

  • 11

    An advanced character in EverQuest 2

  • 12

    Micro-blogging a service allowing subscribers to broadcast short messages (140 char. max.) to other subscribers of the service. Video clips: Twitter in Plain

    English Twouble with Twitter

  • 13

  • 14

    Online research methods

  • 15

    Dimensions of Online Research Methods

    Method: Quantitative standardised observational form of data collection (e.g.

    survey) on sample from larger population; after coding, typically work with numbers

    Qualitative exploring concepts; less focus on standardisation; more involvement by researcher; typically work with text

    Mode: experiments, surveys, field research, unobtrusive research Presence of researcher: Obtrusive (reactive) / Unobtrusive (non-

    reactive)

  • 16

    Analysis of digital trace data (Facebook profiles, websites content, website logs, click behaviour, emails, e-commerce data) is an example of unobtrusive research subjects may know they are being observed (or could be observed) when

    generating the data, but can be considered unobtrusive if this knowledge is not likely to lead to biases in the data for purpose of present study

  • 17

    ORM: Method versus Researcher Presence

  • 18

    Unobtrusive social science researchhow we used to do it...and how it's done today...

  • 19

    Mapping Cyberspace

  • 20

    Cyberspace. A consensual hallucination experienced daily by billions of legitimate operators... A graphic representation of data abstracted from the banks of every computer in the human system. Unthinkable complexity. Lines of light ranged in the nonspace of the mind, clusters and constellations of data. Like city lights, receding... William Gibson, Neuromancer, 1984

    It was suggestive of something, but had no real semantic meaning, even for me, as I saw it emerge on the page. -- Gibson on the origin of the term in the 2000 documentary No Maps for These Territories.

  • 21

    Router-level connectivity of the Internet, 1999 (Internet Mapping Project)

  • 22

    Outbound hyperlinks of the Australian Labor Party (Ackland and Gibson, 2004 using HypViewer)

  • 23

    Outbound hyperlinks of environmental activist organisation (2006 using Large Graph Layout)

    Hyperlink network of an environmental activist organization (each node is a website and the ties are hyperlinks between websites).

    Hyperlink data collected using Virtual Observatory for the Study of Online Networks (VOSON) System (http://voson.anu.edu.au)

    Network map rendered using VOSON & Large Graph Layout

    http://voson.anu.edu.au/

  • 24

    Hyperlink network - Australian sites focused on abortion or pregnancy (Ackland and Evans, 2005)

    VOSON hyperlink network of Australian web sites focused on abortion (Ackland and Evans, 2005)

    Force-directed graphing algorithm clearly displays assortative mixing on basis of abortion stance

    Note boundary-spanner website with high betweeness

  • 25

    Divided They Blog Adamic and Glances (2005)

    Network formed by 1500 US political bloggers

    Each node is a blogger (red - conservative, blue - liberal) and each tie is a hyperlink

    Node size is proportional to indegree

  • 26

    Twitter network (from NodeXL book)

  • 27

    Retweet/mention/replynetworkofTwitteruserswhotweeted(#auspolOR#ausvotes)AND(#asylumOR#asylumseekerOR#marriagequality)January2013

  • 28

    Big Data

  • 29

    What is the role of social scientists in the Big Data era?

    Following draws from Gonzlez-Bailn, S. (2014): "Social Science in the Era of Big Data," forthcoming in Policy & Internet.

    Two views about how Big Data will transform social science:1) Theory and interpretation will become less necessary data will speak

    for themselves - e.g. Anderson (2008)2) Data-driven approaches underestimate role of researchers.

    Disentangling signal from noise is a subjective process. Need (social science) context to identify meaningful correlations (and hopefully causality) in the data.

    Perhaps unsurprisingly, I support view #2...

  • 30

    In order to insights from Big Data we often need to reduce them, by: applying filters (allowing identification of relevant streams of information)

    or by aggregating them in a way that helps identify the right temporal scale or

    spatial resolution. Social science can help in both of those stages

  • 31

    Filters involve sampling, which social scientists know a lot about. For example with Twitter, research often involves: Choosing keywords or hashtags that identify the relevant streams of

    information, or identifying set of seed users from whom to snowball in reconstructing networks of communication.

    We access Twitter data via application programming interfaces (APIs) these generally do not give access to the full stream of information so we don't get a random sample of all activity.

    Both of the above can lead to bias which may lead to incorrect conclusions e.g. conclusions about composition of communication network on Twitter will be

    biased towards most central/active actors if snowball sampling is used

  • 32

    Once we have collected our Twitter data, we need to aggregate them to construct networks of communication. Network ties can be: RTs (retweets) - used to broadcast messages previously sent by other

    users @mentions - used to engage in direct communication with others.

    Conover et al. (2011) found that there is strong ideological polarization on Twitter when RTs are used for network ties, but no polarisation when @mentions are used

  • 33

    Gonzlez-Bailn, S. (2014) Once again, the data cannot speak by themselves, because a lot of

    choices are made along the way to determine how best to analyze themtheir interpretation very much depends on those choices; which are not data-driven but human....In other words, Big Data will not bring about the end of theory; quite the contrary. And social science has a crucial role to play in the discovery of the biases that are intrinsic to digital data, as well as in the construction of convincing stories about what those data reveal.

  • 34

    References Adamic, L., and N. Glance (2005): "The Political Blogosphere and the 2004 U.S. Election: Divided They Blog," Mimeograph. Available

    at: http://www.blogpulse.com/papers/2005/AdamicGlanceBlogWWW.pdf.

    Anderson, C. (2008): The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. in Wired magazine.

    Burt, R. (2011). Structural holes in virtual worlds. Booth School of Business (Univ. of Chicago) working paper.

    Conover, M. D., Jacob Ratkiewicz, M. Francisco, B. Goncalves, Alessandro Flammini, and Filippo Menczer. 2011. Political Polarization on Twitter. in International Conference on Weblogs and Social Media (ICWSM'11).

    Gonzlez-Bailn, S. (2014): "Social Science in the Era of Big Data," forthcoming in Policy & Internet. Available at SSRN: http://ssrn.com/abstract=2238198

    Hansen, D. L., Shneiderman, B., and Smith, M. A. (2010). Analyzing Social Media Networks with NodeXL: Insights from a connected world. Morgan-Kaufmann, Burlington, MA.

    Smith, M., Rainie, L., Himelboim, I. And B. Shneiderman (2014): Mapping Twitter Topic Networks: From Polarized Crowds to Community Clusters, Pew Research Center report. http://www.pewinternet.org/2014/02/20/mapping-twitter-topic-networks-from-polarized-crowds-to-community-clusters

    Williams, D. (2010). The mapping principle, and a research framework for virtual worlds. Communication Theory, 20(4):451470.

    http://www.blogpulse.com/papers/2005/AdamicGlanceBlogWWW.pdf

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34