the ecology of big data-little data convergence

71
@cjlortie

Upload: cjlortie

Post on 25-Jul-2015

165 views

Category:

Data & Analytics


2 download

TRANSCRIPT

@cjlortie

convergence

combination of disparate phenomena

evolution adaptation+

createcollect

combine

data establish convergence

promote increased rates/frame rates

novel connectionsreciprocally accelerated quantification

adventure alignment

connecting ideas is great, connecting data is better

evidence is the root of effective decisions

opportunity personally & professional to make the best possible decisions

Ecology is about connecting the dots.

one of the goals: interaction webs

Ecology can help use understand & manage big data.

Untangle, sort, link threads, & best of all knit together.

Big Data are not static.

Datasets so large/complex that it becomes difficult to process using traditional data processing applications.

775,000,000 results (0.26 seconds) 

V is for Vampire, and Big Data are all about V (and vampires).

Volume

Variety

Velocity

Veracity & Variability

doi:10.1038/sdata.2014.6

remote sensing & microclimate

abundance & distributions with citizen scientists

doi:10.1890/11-2177.1

Challenges: capture, curation, context, & complexity-analytics

data are evidence

Useful Big Data illuminate context, connections or interactions

personal solutions

Context: even a single point in a big dataset is informative

personal solutions

Interactions: focus on schema, archive & aggregate datasets

personal solutions

Synthesis: find & use metrics that allow you to connect datasets.

150 interactions per day3 billion people

Correlation almost always implies causation.

Correlation does not imply causation.

data = evidencewe need to use synthesis tools

only 1 min watching www.worldometers.info

context, interactions, & synthesis procedurally and literally provides the

tools we need to solve global challenges

ecology Big Data

progress

metascience & scientometrics

structural equation models & response surface methodology

internet of things & micro-instrumentation

data citations

metadataschema

novel evidence/data streams

open science

Big Data Little Data

little data challenges

too much running kills

Too much jogging may be as bad for you as not running at all, study suggests.The Independent March 19, 2015

The (Supposed) Dangers of Running Too MuchWhat the data says, and what it doesn’t.

Runner’s World Feb 3, 2015

Sedentary: 413 / 128 Light: 576 / 7Moderate: 262 / 8Strenuous: 40 / 2

little data are not necessarily simple

deep but not wide

little data challenges

contrast

little data challenges

representativeness

little data challenges

power

contrast

pre-posteffect sizes

to small groupsto related data landscape

solutions

contrast

solutions

contrast

solutions

representativeness

solutions

power

solutions

power

solutions

online calculators to explore expectations & pilot design

dot plots by Meaghan Nolan

easiest approach, increase sample size

however, large samples do not replace effective designs

little data design implications

appropriate contrasts & framing of problem

population-level contrasts

use power-thinking: rejection strengths & effect sizes

can it be done must also be connected to frequency

however

Big Data Little Data

Big Data Little Data

context

interactions synthesis

contrast

representativeness

power

convergence implications

framing

convergence implications

synthesis simplifications

@cjlortie

collaborations with web-centric ecology

open-science research objectstagging

meta-datamicro-annotation

ecological interactionsnovel data streams