open data and the programmable city...2015/01/16  · market, tackling customer and employee churn,...

29
The implications of big data for government, business and the academy Rob Kitchin, National University of Ireland Maynooth

Upload: others

Post on 11-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

The implications of big data for government,

business and the academy

Rob Kitchin,

National University of Ireland Maynooth

Page 2: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Small data / big data

Characteristic Small data Big data

Volume Limited to large Very large

Exhaustivity Samples Entire populations

Resolution and

indexicality

Coarse & weak to tight

& strong

Tight & strong

Relationality Weak to strong Strong

Velocity Slow, freeze-framed Fast

Variety Limited to wide Wide

Flexible and scalable Low to middling High

Page 3: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Urban big data

• Directed o Surveillance: CCTV,

drones/satellite

o Digitisation of millions of documents, films, audio recordings

• Automated o Automated surveillance

o Digital devices

o Sensed and scanned data

o Interaction and transactional data

o IoT (Internet of things) and M2M (machine to machine)

• Volunteered o Social media

o Sousveillance (wearables)

o Crowdsourcing

o Citizen science

Page 4: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Big data analytics

• Challenge of making sense of big data is coping with its abundance and exhaustivity, timeliness and dynamism, messiness and uncertainty, semi-structured or unstructured nature

• Solution has been machine learning made possible by advances in computation and computational techniques

• Four broad classes of analytics: • data mining and pattern recognition

• statistical analysis

• prediction, simulation, and optimization

• data visualization and visual analytics

Page 5: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building
Page 6: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Government and Business

Page 7: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Government and business

• Big data and associated analytics will enhance the

governing of people, managing organisations,

leveraging value and producing capital, creating

better places, improving health and well-being,

tackling social and ecological issues, etc.

• Driven by overlapping set of discourses/promises:

improved insight and wisdom, productivity,

competitiveness, efficiency, effectiveness, utility,

sustainability, securitisation ...

Page 8: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Governing people

• State is a prime generator and user of data

• Has sought to create more systematic ways of managing and governing populations and delivering services through auditing and quantification of society

• Citizens and institutions are identified and monitored, records updated, profiles mapped, data analyzed to spot issues and trends, payments are tracked, and services and disciplining administered

• Big data latest set of technologies that can expand and improve state work by extending the timeliness and expansiveness of calculative practices

Page 9: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Managing organisations

• Data provide the basis to manage an organisation more effectively, efficiently, competitively and productively

• Information systems have become essential support infrastructures to track and manage complex assemblages of people, components, commodities and infrastructures across time and space

• Big data - real-time intelligence on an organisation - offers further efficiencies whilst reducing risks, costs and operational losses, and improving customer experience

• Three common data-driven systems to facilitate greater coordination and control within and between organisations include Enterprise Resource Planning (ERP), Supply Chain Management (SCM), and Customer Relationship Management (CRM)

• Produce cost savings across operational base

Page 10: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Leveraging value and capital

• Big data solutions enable realisation of untapped capital, increase return on investment, and leverage competitive advantage

• There are several ways in which big data solutions can offer corporate intelligence that can grow turnover and profits inc. segmenting the market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building various profiles and predictive models to answer a variety of questions:

• whether to contact the customer or not? (target marketing);

• provide the customer with a retention offer or not? (customer retention);

• which type of ad or choice of words/images or product to present to a customer? (content selection);

• which channel the customer should be contacted through? (channel selection);

• whether a customer is offered a higher or lower price? (dynamic pricing/discounting);

• whether a debtor is offered a deeper write-off? (collections);

• whether a customer is offered a higher or lower credit limit or interest rate? (credit risk).

Page 11: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Creating better places

• Produce ‘smart’ cities

• Places increasingly composed of and monitored

by pervasive and ubiquitous computing and

their economy and governance driven by ICT

innovation, creativity and entrepreneurship

• Cities can be understood and regulated in real-

time; they produce, share, integrate, consume

and act on the big data they produce

• Create more liveable, secure, functional,

competitive and sustainable places

Page 12: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

12 Modules: 100s of interactive graphs/maps

• How’s Dublin Doing?

• Dublin Indicators & benchmarking

• Dublin Real-Time

• Real-time data from sensors across Dublin

• Dublin Mapped

• Detailed Census maps for 2006 & 2011 Census, crime, welfare

• Dublin Planning

• Land zoning & planning permissions

• Dublin Near To Me

• Maps of location and nearness to public services; area profiles

• Dublin Housing

• Maps of housing, house prices and commuting patterns

• Dublin Reporting

• Report issues to city authorities

• Dublin Data Stores

• Access to all data used in the dashboard

• Dublin Social (in progress)

• Maps of social media activity

• Dublin Modelled (in progress)

• Modelling and scenario tools

• Dublin Apps (in progress)

• Directory of apps relevant to Dublin

• Have Your Say (in progress)

• Feedback from users

Page 13: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

The Academy

Page 14: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Big data and epistemology

• “Revolutions in science have often been preceded by revolutions in measurement” Sinan Aral

• Big data, coupled with new data analytics, challenges established epistemologies across the sciences, social sciences and humanities

• Academic analysis typically based on extracting insights from small datasets using a limited set of tools

• Now have a data deluge in many fields and a suite of new analytical techniques

• Transforming how we frame, ask and answer questions

Page 15: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

‘The end of theory’

• Anderson (2008) argues that ‘the data deluge makes the scientific method obsolete’; that the patterns and relationships contained within big data inherently produce meaningful and insightful knowledge about complex phenomena.

• “There is now a better way. Petabytes allow us to say: ‘Correlation is enough.’ ... We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot. ... Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all. There’s no reason to cling to our old ways.”

• Ayasdi software claims to be able to:

• “automatically discover insights -- regardless of complexity -- without asking questions.”

Page 16: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

‘The end of theory’

• Powerful and attractive set of ideas at work in the empiricist epistemology that run counter to mainstream deductive approach:

• big data can capture a whole of a domain and provide full resolution

• there is no need for a priori theory, models or hypotheses

• through the application of agnostic data analytics the data can speak for themselves free of human bias or framing, and that any patterns and relationships within big data are inherently meaningful and truthful

• meaning transcends context or domain-specific knowledge, thus can be interpreted by anyone who can decode a statistic or data visualization

• These work together to suggest that a new mode of science is being created

Page 17: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

‘The end of theory’

• Empiricist thinking is problematic for four reasons: • Big data are both a representation and a sample,

shaped by the technology and platform used, the data ontology employed, the regulatory environment, and are subject to sampling bias

• Big data do not arise from nowhere, free from the ‘the regulating force of philosophy’

• Big data cannot they simply speak for themselves free of human bias or framing

• Big data cannot be interpreted outside of context and domain-specific knowledge

Page 18: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Data-driven science

• Data-driven science seeks to hold to the tenets of the scientific method, but is more open to using a hybrid combination of abductive, inductive and deductive approaches to advance the understanding of a phenomena

• It differs from the traditional, experimental deductive design in that it seeks to generate hypotheses and insights ‘born from the data’ rather than ‘born from the theory’

• Seeks to incorporate a mode of induction into the research design, though explanation through induction is not the intended end-point. Instead, it forms a new mode of hypothesis generation before a deductive approach is employed

• Nor does the process of induction arise from nowhere, but is situated and contextualised within a highly evolved theoretical domain

• As such, the epistemological strategy is to use guide knowledge discovery techniques to identify potential questions worthy of further examination and testing

• Approach is suited to extracting additional, valuable insights that traditional ‘knowledge-driven science’ would fail to generate

Page 19: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Computational social science

• For positivistic scholars in the social sciences, big data offers the opportunity to develop more sophisticated, wider-scale, finer-grained models of human life. To shift from: • data-scarce to data-rich studies of societies

• from static snapshots to dynamic unfoldings

• from coarse aggregations to high resolutions

• from relatively simple models to more complex, sophisticated simulations

• The potential is for studies with much greater breadth, depth, scale, and timeliness, and are inherently longitudinal

• Moreover, the variety, exhaustivity, resolution, and relationality of data, plus the growing power of computation and new data analytics, addresses some of the critiques of positivistic scholarship to date, especially those of reductionism and universalism, by providing more finely grained, sensitive, and nuanced analysis

Page 20: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Digital humanities

• For post-positivist scholars, big data offers both opportunities and challenges

• The opportunities are a proliferation, digitisation and interlinking of a diverse set of analogue and unstructured data, much of it new (e.g., social media) and many of which have heretofore been difficult to access (e.g., millions of books, documents, newspapers, photographs, art works, material objects, etc., from across history)

• Provision of new tools of data curation, management and analysis that can handle massive numbers of data objects

• Rather than concentrating on a handful of novels or photographs, or a couple of artists and their work, it becomes possible to search and connect across a very large number of related works

• Has implications for social sciences, but most widely being examined through the emerging field of digital humanities

Page 21: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Digital humanities

• Digital humanities advocates broadly divided into two camps epistemologically • Those that believe that that new techniques -- counting, graphing,

mapping, data mining -- bring methodological rigour and objectivity to disciplines that heretofore been unsystematic and random in their focus and approach

• Those that see the techniques as a supplement to, rather than replacement for existing humanities methods and theory building

• Both cases tend to use descriptive rather than inferential statistics

• The claims of the former have opened up an epistemological debate centred on close versus distant reading/interpretation, ability of algorithms to parse meaning & context

• DH seen by some as mechanistic and reductionist, identifying patterns but not processes or meaning

• Also worries CSS and DH relegate questions concerning metaphysical aspects of human life (meanings, beliefs, experiences) and normative questions (ethical and moral dilemmas about how things should be as opposed to how they are)

Page 22: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

What happens to small data studies?

• Big data doesn’t replace or negate small data

• Small data have a proven track record of answering specific questions, with est. procedures, methods, etc.

• Studies can be much more finely tailored

• Small data studies seek to mine gold from carefully working a narrow seam, whereas big data studies seek to extract nuggets through open-pit mining, scooping up and sieving huge tracts of land

• Small data will, however, increasingly be made more big data-like through the development of new data infrastructures that pool, scale and link small data in order to create larger datasets, encourage sharing and re-use, and open them up to combination with big data and analysis using big data analytics

Page 23: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Ethical, political and social consequences of big data

Page 24: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Surveillance

• Surveillance/dataveillance

• Creation of extensive digital footprints (data people themselves leave behind) but data shadows (information about them generated by others)

• Related to individual, objects, interactions, transactions, territories ...

• Creation of a vast data market and data brokerage and analytics industry

• Reshaping individual’s relationship with companies and state

• Significantly reshaping privacy

Page 25: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Data type Data collected by Uber android app

Accounts log email log

App Activity name, package name, process number of activity, processed id

App Data Usage Cache size, code size, data size, name, package name

App Install installed at, name, package name, unknown sources enabled, version code, version

name

Battery health, level, plugged, present, scale, status, technology, temperature, voltage

Device Info board, brand, build version, cell number, device, device type, display, fingerprint, IP,

MAC address, manufacturer, model, OS platform, product, SDK code, total disk

space, unknown sources enabled

GPS accuracy, altitude, latitude, longitude, provider, speed

MMS from number, MMS at, MMS type, service number, to number

NetData bytes received, bytes sent, connection type, interface type

PhoneCall call duration, called at, from number, phone call type, to number

SMS from number, service number, SMS at, SMS type, to number

TelephonyInfo cell tower ID, cell tower latitude, cell tower longitude, IMEI, ISO country code, local

area code, MEID, mobile country code, mobile network code, network name,

network type, phone type, SIM serial number, SIM state, subscriber ID

WifiConnection BSSID, IP, linkspeed, MAC addr, network ID, RSSI, SSID

WifiNeighbors BSSID, capabilities, frequency, level, SSID

Root Check root status code, root status reason code, root version, sig file version

Malware Info algorithm confidence, app list, found malware, malware SDK version, package list,

reason code, service list, sigfile version

Page 26: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Privacy

• Surveillance Watching, listening to, or recording of an individual’s activities

• Interrogation Various forms of questioning or probing for information

• Aggregation The combination of various pieces of data about a person

• Identification Linking information to particular individuals

• Insecurity Carelessness in protecting stored information from leaks and improper access

• Secondary Use Use of information collected for one purpose for a different purpose without the

data subject’s consent

• Exclusion Failure to allow the data subject to know about the data that others have about her

and participate in its handling and use, including being barred from being able to

access and correct errors in that data

• Breach of confidentiality Breaking a promise to keep a person’s information confidential

• Disclosure Revelation of information about a person that impacts the way others judge her

character

• Exposure Revealing another’s nudity, grief, or bodily functions

• Increased Accessibility Amplifying the accessibility of information

• Blackmail Threat to disclose personal information

• Appropriation The use of the data subject’s identity to serve the aims and interests of another

• Distortion Dissemination of false or misleading information about individuals

• Intrusion Invasive acts that disturb one’s tranquillity or solitude

• Decisional Interference Incursion into the data subject’s decisions regarding her private affairs

Page 27: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Political, ethical and social issues

• Predictive profiling and social sorting

• Dynamic pricing

• Anticipatory governance

• Control creep

• Technocratic modes of governance and

technological lock-ins

• Vulnerabilities: buggy, brittle and hackable systems

• Data protection and security

Page 28: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

Conclusion

• Big data/analytics does constitute a data revolution –

fundamental alters the nature of data and how we make sense of

them

• It is starting to transform how business and government is

conducted, organised and managed

• It is also starting to alter how research is conducted across the

academy

• As the technology and analytics improve these transformations

will extend and deepen

• They will thus pose significant epistemological questions, as well

social, political and ethical ones

• We are only just starting to examine and think through these

questions

Page 29: Open Data and the Programmable City...2015/01/16  · market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building

[email protected] @robkitchin

Kitchin, R., Lauriault, T. and McArdle, G. (2015) Knowing and governing cities through urban indicators, city benchmarking and real-time dashboards. Regional Studies, Regional Science 2

Kitchin, R. and Lauriault, T. (2014) Towards critical data studies. SSRN

Kitchin, R. and Lauriault, T. (2014) Small data in the era of big data. GeoJournal (online first)

Kitchin, R. (2014) Big data, new epistemologies and paradigm shifts. Big Data and Society 1 (April-June): 1-12.

Kitchin, R. (2014) The real-time city? Big data and smart urbanism. GeoJournal 79(1): 1-14.

Kitchin, R. (2013) Big data and human geography: Opportunities, challenges and risks. Dialogues in Human Geography 3(3): 262–267

http://www.nuim.ie/progcity

@progcity