instance construction via likelihood-based data squashing

13

Instance Construction via Likelihood-Based Data Squashing Madigan D., Madigan D., et. al. (Ch 12, (Ch 12, Instance selection and Construction for Data Mining Instance selection and Construction for Data Mining (2001), (2001), K ruwer Academic Publishers) Summarize: Jinsan Yang, SNU Biointelligence Lab

Upload: fitzgerald-johnson

Post on 03-Jan-2016

37 views

Category:

Documents

3 download

Report

Download

Embed Size (px):

DESCRIPTION

Instance Construction via Likelihood-Based Data Squashing. Madigan D., et. al . (Ch 12, Instance selection and Construction for Data Mining (2001), Kruwer Academic Publishers) Summarize: Jinsan Yang, SNU Biointelligence Lab. Abstract Data Compression Method: Squashing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Instance Construction via Likelihood-Based Data Squashing

Instance Construction via Likelihood-Based Data Squashing

Madigan D.,Madigan D., et. al. (Ch 12, (Ch 12, Instance selection and Construction for Data MiningInstance selection and Construction for Data Mining (2001), (2001), Kruwer Acade

mic Publishers)

Summarize: Jinsan Yang, SNU Biointelligence Lab

Page 2: Instance Construction via Likelihood-Based Data Squashing

AbstractData Compression Method: Squashing

LDS: Likelihood based data squashing

Keywords Instance Construction, Data Squashing

Page 3: Instance Construction via Likelihood-Based Data Squashing

Outline

IntroductionThe LDS AlgorithmEvaluation: Logistic Regression Evaluation: Neural NetworksIterative LDSDiscussion

Page 4: Instance Construction via Likelihood-Based Data Squashing

Introduction Massive data examples

Large-scale retailingTelecommunicationsAstronomyComputational biologyInternet logging

Some computational challengesNeed of multiple passes for data access10^5~6 times slower than main memoryCurrent Solution:Scaling up existing algorithmHere: Scaling down the data

Data squashing: 750000 8443 ( DuMouchel et al (1999), Outperforms by a factor of 500 in MSE than random sample of size 7543

Page 5: Instance Construction via Likelihood-Based Data Squashing

LDS Algorithm Motivation: Bayesian rule

Given three data points d1,d2,d3, estimate the parameter :

Clusters by likelihood profile:

)()|()|()|(),,|( 321321 pdpdpdpdddp

)|()|()|(,

),|()|(

212**

21

21

dpdpdpwithdbyddsquash

dpdpIf

))|((,),|((( 1 kii dpdp

Page 6: Instance Construction via Likelihood-Based Data Squashing

LDS Algorithm Details of LDS Algorithm

[Select] Values of by a central composite design

Central composite Design for 3 factors

Page 7: Instance Construction via Likelihood-Based Data Squashing

LDS Algorithm

[Profile] Evaluate the likelihood profiles

[Cluster] Cluster the mother data in a single pass- Select n’ random samples as initial cluster centers

- Assign the remaining data to each cluster

[Construct] Construct the Pseudo data:

cluster center

Page 8: Instance Construction via Likelihood-Based Data Squashing

Evaluation: Logistic Regression•Small-scale simulations:

•Initial estimate of

•Plot: Log (Error Ratio)

•Three methods of initial parameter estimations

•100 data / 48 squashed data

5544332211

)1(1

)1(log

XXXXX

yp

yp

Page 9: Instance Construction via Likelihood-Based Data Squashing

Evaluation: Logistic Regression Medium Scale: 100000 , base: 1% simple random sampling

Page 10: Instance Construction via Likelihood-Based Data Squashing

Evaluation: Logistic Regression Large Scale: 744963 , base: 1% simple random sampling

Page 11: Instance Construction via Likelihood-Based Data Squashing

Evaluation: Neural Networks Feed forward, two input nodes, one hidden layer with 3 units,

Single binary output

Mother data: 10000, Squashed data: 1000, repetitions:30

test data: 1000 from the same network

Comparisons for P(whole) - P(reduced)

Page 12: Instance Construction via Likelihood-Based Data Squashing

Evaluation: Neural Networks

Page 13: Instance Construction via Likelihood-Based Data Squashing

Iterative LDS

When the estimation of is not accurate.

1. Set from simple random sampling

2. Squash by LDS

3. Estimate

4. Go to 2.

Joint and Conditional Maximum Likelihood Estimation for ... · Joint and Conditional Maximum Likelihood Estimation for ... Joint and Conditional Maximum Likelihood ... marginal maximum-likelihood

Robust Imitation of Diverse Behaviors - DeepMind · As in generative modeling, we can easily apply maximum likelihood to imitation learning. For instance, if the dynamics are tractable,

Automated 0-day discovery in 2021: Squashing the low

End to End Testing: Bug Squashing for API Developers

Likelihood and Conditional Likelihood Inference for ...dzhang2/paper/cgamm.pdf · Likelihood and Conditional Likelihood Inference for Generalized Additive ... in the model and marginal

Chapter 4 E cient Likelihood Estimation and Related Tests · Chapter 4 E cient Likelihood Estimation and Related Tests 1 Maximum likelihood and e cient likelihood estimation We begin

Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model

Likelihood methods

Chapter 6 Likelihood Inference · Chapter 6 Likelihood Inference CHAPTER OUTLINE Section 1 The Likelihood Function Section 2 Maximum Likelihood Estimation Section 3 Inferences Based

‘Squashing the Beef’: Combatting Gang Violence and ... · ‘Squashing the Beef’: Combatting Gang Violence and Reforming Masculinity in East London Gary Armstronga and James

Likelihood of Purchase On-Line: Reliability, Security, and ... · Likelihood of Purchase HI 7. Poor design decreases likelihood of purchase. HI 8. Unreliability decreases likelihood

7 Likelihood and Maximum Likelihood Estimation - …xian/MD3/td3.pdf · 7 Likelihood and Maximum Likelihood Estimation Exercice 7.1. ... distributed according to a Pareto law with

Parameter Estimation & Maximum Likelihood · parameter value. Likelihood Function Data value Parameter value Likelihood function given data. Maximum Likelihood • Consistency - with

Semiparametric Likelihood Ratio Inference RevisitedSemiparametric Likelihood Ratio Inference Revisited December Abstract 2000 We extend the Semiparametric Likelihood Ratio Theorem

Squashing Flat Stanley: Online Community Threatened by Trademark Issues GRIT687 Jeffrey Linton

Idea Growers vs. Idea Killers: How to Communicate Without Squashing What-Could-Be-Great Ideas

A challenge to move tomatoes down a mountainside without squashing them!

Squashing Flat Stanley: Online Community Threatened by Trademark Issues

"Squashing peanuts and smashing pumpkins": How noise distorts

Sieve Empirical Likelihood and Extensions of the ... · Sieve Empirical Likelihood and Extensions ... of the above parametric likelihood, arise naturally. One ... the parametric likelihood

Maximum Likelihood and Restricted Likelihood - NIST Page

Determining whether the squashing factor, Q, would be a

Stéphane Ducasse«ChapterNr».1 Elements of Design Instance initialization Enforcing the instance creation Instance / Class methods Instance variables

Probability Section 7.1. What is probability? Probability discusses the likelihood or chance of something happening. For instance, -- the probability

OSP318. ProfileSynchronizationServiceInstanceProfileSynchronizationServiceInstance Profile Service Instance Instance

Maximum Likelihood

Maximum Likelihood Estimation - Keio Universityuser.keio.ac.jp/~nagakura/ae2020/Maximum_Likelihood... · 2020. 6. 1. · Maximum Likelihood Estimation The maximum likelihood estimator

Tracking and Squashing Bugs

Maximum Likelihood and Robust Maximum Likelihood

Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood

Computing the maximum likelihood estimates: concentrated likelihood, EM ... · Computing the maximum likelihood estimates: concentrated likelihood, EM-algorithm Dmitry Pavlyuk The

Maximum Likelihood

Maximum likelihood estimation of covariance parameters … · Maximum likelihood estimation of covariance ... variables estimated ... Maximum likelihood estimation of covariance parameters

Maximum likelihood (ML) Conditional distribution and likelihood Maximum likelihood estimator Information in the data and likelihood Observed and Fisher’s