paroma varma, bryan he, payalbajaj, nishith khandwala, …bryanhe/posters/inferring_nips... ·...

1
Inferring Generative Model Structure with Static Analysis Paroma Varma, Bryan He, Payal Bajaj, Nishith Khandwala, Imon Banerjee, Daniel Rubin, Christopher Ré Summary Generative Models to Label Training Data Use generative models to combine sources of weak supervision to assign noisy labels Complex Dependencies among Sources Sources of labels are rarely independent Inferring Model Structure Use static analysis to infer dependencies and encode in generative model structure Heuristic Structure Domain Specific Primitives (DSPs) Interpretable characteristics of raw data Heuristic Functions (HFs) Programmatic rules that output noisy labels Static Analysis Shared Input Sharing primitives as inputs leads to explicit dependencies Compositions: Primitives composed of others can lead to implicit dependencies Statistical Modeling HF Dependency Represents the dependencies found using static analysis DSP Similarity Represents the learned correlations among the DSPs Theoretical Results Learning Dependencies Learning k-degree dependencies among heuristics requires #$% log samples Inferring Dependencies Analyzing the code can infer the dependencies among heuristics without data Given dependencies, learning heuristic accuracies requires log samples Experimental Results p 1 % +, - +, . +, Y /01 λ 1 λ 2 λ 3 p 2 p 3 - 233 % 233 . 233 P.w P.h P.y λ 1 λ 2 λ 3 DSP HF Shared Input, Explicit Dep. Composition, Implicit Dep. 4 +, 5 233 /01 Y True Label Accuracy HF Dependency DSP Similarity def λ_1(P.y): if P.y[‘person’]>= P.y[‘bike’]: return True else: return False def λ_2(P.h): if P.h[‘person’] >= P.h[‘bike’]: return True else: return False def λ_3(P.area): if P.area[‘person’] <= 2*P.area[‘bike’]: return True else: return False Inferring dependencies outperforms learning dependencies Outperforms fully supervised model with additional noisy training labels * reports F1 scores, rest in accuracy (%)

Upload: others

Post on 06-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Paroma Varma, Bryan He, PayalBajaj, Nishith Khandwala, …bryanhe/posters/inferring_nips... · 2018. 1. 30. · Inferring Generative Model Structure with Static Analysis Paroma Varma,

Inferring Generative Model Structure with Static Analysis Paroma Varma, Bryan He, Payal Bajaj, Nishith Khandwala, Imon Banerjee, Daniel Rubin, Christopher Ré

MotivationsSummary

• Generative Models to Label Training DataUse generative models to combine sources of weak supervision to assign noisy labels

• Complex Dependencies among SourcesSources of labels are rarely independent

• Inferring Model StructureUse static analysis to infer dependencies and encode in generative model structure

Heuristic Structure• Domain Specific Primitives (DSPs)

Interpretable characteristics of raw data

• Heuristic Functions (HFs)Programmatic rules that output noisy labels

MotivationsStatic Analysis• Shared Input Sharing primitives as inputs

leads to explicit dependencies

• Compositions: Primitives composed of others can lead to implicit dependencies

Statistical Modeling• HF Dependency Represents the

dependencies found using static analysis

• DSP Similarity Represents the learned correlations among the DSPs

MotivationsTheoretical Results

• Learning DependenciesLearning k-degree dependencies among 𝑛heuristics requires 𝑂 𝑛#$%log𝑛 samples

• Inferring DependenciesAnalyzing the code can infer the dependencies among heuristics without data

Given dependencies, learning heuristic accuracies requires 𝑂 𝑛 log𝑛 samples

MotivationsExperimental Results

p1𝜙%+,

𝜙-+,

𝜙.+,

Y 𝜙/01

λ1

λ2

λ3

p2

p3

𝜙-233

𝜙%233

𝜙.233

P.w

P.h

P.yλ1

λ2

λ3

DSP HF

Shared Input, Explicit Dep.

Composition, Implicit Dep.

𝜙4+,

𝜙5233

𝜙/01

YTrue Label

Accuracy

HFDependency

DSP Similarity

def λ_1(P.y):if P.y[‘person’]>= P.y[‘bike’]:

return Trueelse:

return False

def λ_2(P.h):if P.h[‘person’] >= P.h[‘bike’]:

return Trueelse:

return False

def λ_3(P.area):if P.area[‘person’]

<= 2*P.area[‘bike’]:return True

else:return False

• Inferring dependencies outperforms learning dependencies

• Outperforms fully supervised model with additional noisy training labels

* reports F1 scores, rest in accuracy (%)