orchestrating collective intelligence

76
@josephreisinger @premisedata

Upload: turi-inc

Post on 15-Apr-2017

274 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Orchestrating Collective Intelligence

@josephreisinger @premisedata

Page 2: Orchestrating Collective Intelligence
Page 3: Orchestrating Collective Intelligence

WHAT PREMISE MEASURES

Page 4: Orchestrating Collective Intelligence

Bringing visibility to the world’s hardest-to-see places. 130 cities, 30 countries.

Page 5: Orchestrating Collective Intelligence

Modernizing Economic Measurement

Page 6: Orchestrating Collective Intelligence

“I have been constantly surprised at how little quantitative information can be brought to bear on fundamental policy questions [...] This experience illustrates the need for flexibility in data collection, especially when policymakers consider extending new policies or need to evaluate them in real time for other reasons. Ideally, some sort of ‘rapid response’ data gathering capacity.”

— Alan Krueger, “Stress Testing Economic Data”

Page 7: Orchestrating Collective Intelligence

“The collection of statistics needs to be modernized; it is time to use the new technologies to start collecting data.

…particularly important in developing countries where the prevalence of mobile phones now offers an unprecedented opportunity to measure the economy.”

— Diane Coyle, “GDP”

Page 8: Orchestrating Collective Intelligence
Page 9: Orchestrating Collective Intelligence

OMGWTFGDP

Page 10: Orchestrating Collective Intelligence

“However, at this moment in survey research, uncertainty reigns. Participation rates in household surveys are declining throughout the developed world. Surveys seeking high response rates are experiencing crippling cost inflation. Traditional sampling frames that have been serviceable for decades are fraying at the edges.”

— Robert Groves, “Three Eras of Survey Research”

Page 11: Orchestrating Collective Intelligence
Page 12: Orchestrating Collective Intelligence

Orchestrating Collective Intelligence

Page 13: Orchestrating Collective Intelligence
Page 14: Orchestrating Collective Intelligence

PREMISE APP

Directed on-the-ground data acquisition

Page 15: Orchestrating Collective Intelligence

Crowdsourcing vs Orchestration

Page 16: Orchestrating Collective Intelligence

Crowdsourcing

survey

Page 17: Orchestrating Collective Intelligence

Crowdsourcing

survey survey tasks

Page 18: Orchestrating Collective Intelligence

Crowdsourcing

survey tasks workerssurvey

Page 19: Orchestrating Collective Intelligence

Orchestration

survey

Page 20: Orchestrating Collective Intelligence

Orchestration

survey survey tasks

Page 21: Orchestrating Collective Intelligence

Orchestration

survey tasks workerssurvey

Page 22: Orchestrating Collective Intelligence

Orchestration

survey tasks workerssurvey

Page 23: Orchestrating Collective Intelligence

Orchestration

survey tasks workerssurvey

Page 24: Orchestrating Collective Intelligence

Orchestration

survey tasks workerssurvey

Page 25: Orchestrating Collective Intelligence

Orchestration

survey tasks workerssurvey

Page 26: Orchestrating Collective Intelligence

survey campaign

allocation quality control

analytics

end-user

data contributor

User poses a question that is best answered by via actual, on-the-ground observation at scale.

Question is translated into an internal “specification” of the data points needed to answer the question: type, location, frequency, coverage, etc.Inventory of data points automatically allocated to data contributor pool, taking into account budget, agent profiles and geography. Data points are dynamically priced.

Contributors collect data in the field using Android phones…

… which are sent back to the Premise network.

QC is a mix of automated (outlier detection; machine learning; computer vision) and manual (directed sampling using oDesk) checks.

Automated capabilities to explore data and expose trends or patterns; hypothesize new features to explain variation; suggest specification refinement; improve automated verification.

end user

data contributor

PLATFORM

Page 27: Orchestrating Collective Intelligence

Resource Scarcity and

Access Risk

Page 28: Orchestrating Collective Intelligence
Page 29: Orchestrating Collective Intelligence
Page 30: Orchestrating Collective Intelligence

Average wait times are about ~10m longer in Maracaibo than in Caracas.

Police are present ~80% of the time in Maracaibo, but only 30-40% in Caracas.

Page 31: Orchestrating Collective Intelligence

Machine Learning

Page 32: Orchestrating Collective Intelligence

survey campaign

allocation quality control

analytics

end-user

data contributor

User poses a question that is best answered by via actual, on-the-ground observation at scale.

Question is translated into an internal “specification” of the data points needed to answer the question: type, location, frequency, coverage, etc.Inventory of data points automatically allocated to data contributor pool, taking into account budget, agent profiles and geography. Data points are dynamically priced.

Contributors collect data in the field using Android phones…

… which are sent back to the Premise network.

QC is a mix of automated (outlier detection; machine learning; computer vision) and manual (directed sampling using oDesk) checks.

Automated capabilities to explore data and expose trends or patterns; hypothesize new features to explain variation; suggest specification refinement; improve automated verification.

end user

data contributor

PLATFORM

Page 33: Orchestrating Collective Intelligence

survey campaign

allocation quality control

analytics

end-user

data contributor

User poses a question that is best answered by via actual, on-the-ground observation at scale.

Question is translated into an internal “specification” of the data points needed to answer the question: type, location, frequency, coverage, etc.Inventory of data points automatically allocated to data contributor pool, taking into account budget, agent profiles and geography. Data points are dynamically priced.

Contributors collect data in the field using Android phones…

… which are sent back to the Premise network.

QC is a mix of automated (outlier detection; machine learning; computer vision) and manual (directed sampling using oDesk) checks.

Automated capabilities to explore data and expose trends or patterns; hypothesize new features to explain variation; suggest specification refinement; improve automated verification.

end user

data contributor

allocation

PLATFORM

Page 34: Orchestrating Collective Intelligence

survey campaign

allocation quality control

analytics

end-user

data contributor

User poses a question that is best answered by via actual, on-the-ground observation at scale.

Question is translated into an internal “specification” of the data points needed to answer the question: type, location, frequency, coverage, etc.Inventory of data points automatically allocated to data contributor pool, taking into account budget, agent profiles and geography. Data points are dynamically priced.

Contributors collect data in the field using Android phones…

… which are sent back to the Premise network.

QC is a mix of automated (outlier detection; machine learning; computer vision) and manual (directed sampling using oDesk) checks.

Automated capabilities to explore data and expose trends or patterns; hypothesize new features to explain variation; suggest specification refinement; improve automated verification.

end user

data contributor

analytics

PLATFORM

Page 35: Orchestrating Collective Intelligence

survey campaign

allocation quality control

analytics

end-user

data contributor

User poses a question that is best answered by via actual, on-the-ground observation at scale.

Question is translated into an internal “specification” of the data points needed to answer the question: type, location, frequency, coverage, etc.Inventory of data points automatically allocated to data contributor pool, taking into account budget, agent profiles and geography. Data points are dynamically priced.

Contributors collect data in the field using Android phones…

… which are sent back to the Premise network.

QC is a mix of automated (outlier detection; machine learning; computer vision) and manual (directed sampling using oDesk) checks.

Automated capabilities to explore data and expose trends or patterns; hypothesize new features to explain variation; suggest specification refinement; improve automated verification.

end user

data contributor

quality control

PLATFORM

Page 36: Orchestrating Collective Intelligence

Optimizing Task Allocation

Page 37: Orchestrating Collective Intelligence

TASKS

Page 38: Orchestrating Collective Intelligence
Page 39: Orchestrating Collective Intelligence

locations

measurables

CAMPAIGN DEFINITION

Page 40: Orchestrating Collective Intelligence

locations

measurables

CAMPAIGN DEFINITION

Page 41: Orchestrating Collective Intelligence

locations

measurables

CAMPAIGN DEFINITION

Page 42: Orchestrating Collective Intelligence

locations

measurables

CAMPAIGN DEFINITION

Page 43: Orchestrating Collective Intelligence

locations

measurables

CAMPAIGN DEFINITION

Page 44: Orchestrating Collective Intelligence

locations

measurables

survey period 1

CAMPAIGN DEFINITION

Page 45: Orchestrating Collective Intelligence

locations

measurables

CAMPAIGN DEFINITION

survey period 1 survey period 2

Page 46: Orchestrating Collective Intelligence

locations

measurables

survey period 2

CAMPAIGN DEFINITION

survey period 1

Page 47: Orchestrating Collective Intelligence

locations

measurables

survey period 1

TASK ALLOCATION

user 1

user 2

user 3

allocation period:

1

Page 48: Orchestrating Collective Intelligence

locations

measurables

survey period 1

TASK ALLOCATION

user 1

user 2

user 3

allocation period:

1 2

Page 49: Orchestrating Collective Intelligence

locations

measurables

survey period 1

TASK ALLOCATION

user 1

user 2

user 3

allocation period:

1 2 3

Page 50: Orchestrating Collective Intelligence

TASK COMPLETION RATE MODEL

payout

pTCR

“uptake risk”

Model features: user-history, task-history / location-history, task-user, location-user

Issues: data sparsity in marginal vs conditional, uptake counterfactuals (non-iid sampling), path-dependence / lock-in

Linear functional model

}

Page 51: Orchestrating Collective Intelligence

Explorationvs

Survey Consistency

Page 52: Orchestrating Collective Intelligence
Page 53: Orchestrating Collective Intelligence
Page 54: Orchestrating Collective Intelligence
Page 55: Orchestrating Collective Intelligence

locations

measurables

period 1 period 2

Page 56: Orchestrating Collective Intelligence

locations

measurables

period 1 period 2

Page 57: Orchestrating Collective Intelligence

TASK REFINEMENT

Page 58: Orchestrating Collective Intelligence

ITERATIVE LOCATION DISCOVERY

Page 59: Orchestrating Collective Intelligence

Exploration vs Survey Consistency

- Campaign layers: separate discovery and survey

- Iteratively refine attribute and geospatial targeting

- Monitor correlation in item responses and appearance of new attributes

- Monitor residual endogeneity

Page 60: Orchestrating Collective Intelligence

Fraud and Coalition

Formation

Page 61: Orchestrating Collective Intelligence
Page 62: Orchestrating Collective Intelligence

Coalitions vs Referrals

- Referrals are necessary to reach most remote areas

- However we need to be able to partition the Premise graph into independent subnetworks, e.g. for re-evaluation, experimentation and sample stratification.

Page 63: Orchestrating Collective Intelligence

CONTRIBUTOR AFFINITY MODEL

Model features:

direct referralaccount featuresupload locationvisit historiesgeographic arearesponse correlation

Issues: bootstrapping affinity scores for new users, optimal scheduler is antagonistic for coalition discovery

Sampling from Large Graphs [Leskovec & Faloutsos; 2006]

weight

Page 64: Orchestrating Collective Intelligence

RECAP

- Orchestrating collective intelligence

- Optimizing task allocation via dynamic scheduling and incentives

- Exploration and discovery while maintaining survey consistency

- Fraud and coalition formation in networks

Page 65: Orchestrating Collective Intelligence

QUESTIONS?

instagram/premisedata

(all images in this talk)

[email protected] | @josephreisinger

Page 66: Orchestrating Collective Intelligence

PROOF PROOFAUTO QC PROOFMANUAL QC MANUAL QCREVALIDATION

Page 67: Orchestrating Collective Intelligence

“The problem of changing statistics is that you lose the ability to compare across time. The longer the time-series, the harder it is to change it, but you want to be able to compare. How do you replace GDP? And if you do, you lose the past sixty years of relevance. This has been a problem for centuries—take the Spanish silver trade. Anything you measure will become increasingly irrelevant over time.”

— Hans Rosling

[Zachary Karabell, The Leading Indicators]

Page 68: Orchestrating Collective Intelligence
Page 69: Orchestrating Collective Intelligence
Page 70: Orchestrating Collective Intelligence

“You need to focus on quality. You’ll be better off with a small but carefully structured sample rather than a large sloppy sample.”

— Hal Varian, Google

Page 71: Orchestrating Collective Intelligence

“Big Data is bullshit”

— Harper Reed

Page 72: Orchestrating Collective Intelligence

Big Data, n.: the belief that any sufficiently large pile of shit contains a pony with probability approaching one

—@grimmelm

Page 73: Orchestrating Collective Intelligence
Page 74: Orchestrating Collective Intelligence

“dividing by bieber”

Page 75: Orchestrating Collective Intelligence
Page 76: Orchestrating Collective Intelligence