final_f01

7/28/2019 final_f01

1/12

7/28/2019 final_f01

2/12

3. (T/F 2 points) In HMMs, if there are at least two distinct mostlikelyhiddenstatesequencesandthetwostatesequencescross inthemiddle(shareasinglestateatanintermediatetimepoint),thenthereareatleastfourmostlikelystatesequences.

4. (T/F2points)OneadvantageofBoostingisthatitdoesnotoverfit.

5. (T/F2points) Supportvector machinesareresistanttooutliers,i.e.,verynoisyexamplesdrawnfromadifferentdistribution.

6. (T/F2points)Activelearningcansubstantiallyreducethenumberoftrainingexamplesthatweneed.

Problem2Consider two classifiers: 1) an SVM with a quadratic (second order polynomial) kernelfunctionand2)anunconstrainedmixtureoftwoGaussiansmodel,oneGaussianperclasslabel. Theseclassifiers trytomap examples inR2 tobinary labels. Weassume thattheproblemisseparable,noslackpenaltiesareaddedtotheSVMclassifier,andthatwehavesufficientlymanytrainingexamplestoestimatethecovariancematricesofthetwoGaussiancomponents.

1. (T/F2points) ThetwoclassifiershavethesameVC-dimension.

2. (4points)Supposeweevaluatedthestructuralriskminimizationscoreforthetwoclassifiers. The score is the bound on the expected loss of the classifier, when theclassifierisestimatedonthebasisofntrainingexamples. Whichofthetwoclassifiersmightyieldthebetter(lower)score? Provideabriefjustification.

2Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

7/28/2019 final_f01

3/12

3. (4points)Supposenowthatweregularizetheestimationofthecovariancematricesfor the mixture of two Gaussians. In other words, we would estimate each classconditionalcovariancematrixaccordingto

n n

reg = + S (1)n+n n+n

where n isthenumberoftraining examples, isthe unregularizedestimateofthecovariancematrix(samplecovariancematrixoftheexamples inoneclass), S isourprior covariance matrix (same for both classes), and n the equivalent sample sizethatwecanusetobalancebetweenthepriorandthedata.IncomputingtheVC-dimensionofaclassifier,wecanchoosethesetofpointsthatwe try to shatter. In particular, we can scale any k points by a large factor andusetheresultingsetofpoints forshattering. In lightofthis, wouldyouexpectourregularizationtochangetheVC-dimension? Whyorwhynot?

4. (T/F 2 points) Regularization in the above sense would improvethestructuralriskminimizationscoreforthemixtureoftwoGaussians.


7/28/2019 final_f01

4/12

Problem3The problem here is to predict the identity of a person based on a single handwrittencharacter. The observed characters (one of a, b, or c) are transformed into binary 8by8pixelimages. Therearefourdifferentpeopleweneedtoidentifyonthebasisofsuchcharacters. Todothis,wehaveatrainingsetofabout200examples,whereeachexampleconsistsofabinary8x8imageandtheidentityofthepersonitbelongsto. Youcanassumethattheoverallnumberofoccurencesofeachpersonandeachcharacterinthetrainingsetisroughlybalanced.Wewouldliketouseamixtureofexpertsarchitecturetosolvethisproblem.

1. (2points)Howmighttheexpertsbeuseful? Suggestwhattaskeachexpertmightsolve.

2. (4points)Drawagraphicalmodelthatdescribesthemixtureofexpertsarchitectureinthiscontext. Indicatewhatthevariablesareandthenumberofvaluesthattheycantake. Shadeanynodescorrespondingtovariablesthatarealwaysobserved.


7/28/2019 final_f01

5/12

3. (4points)Beforeimplementingthemixtureofexpertsarchitecture,weneedtoknowtheparametricformoftheconditionalprobabilitiesinyourgraphicalmodel. Provideareasonablespecificationoftherelevantconditionalprobabilitiestotheextentthatyoucouldaskyourclass-matetoimplementtheclassifier.

4. (4 points) So we implemented your method, ran the estimation algorithm once,and measured the test performance. The method was unfortunately performing ata chance level. Provide two possible explanations for this. (there may be multiplecorrectanswershere)

5. (3 points) Would we get anything reasonable out of the estimation if, initially,all the experts were identical while the parameters of the gating network would bechosen randomly? By reasonable we mean training performance. Provide a brief

justification.


7/28/2019 final_f01

6/12

6. (3points)Wouldwegetanythingreasonableouttheestimation ifnowthegatingnetworkisinitiallysetsothatitassignseachtrainingexampleuniformlytoallexpertsbut the experts themselves are initialized with random parameter values? Again,reasonablereferstothetrainingerror. Provideabriefjustification.

Problem4Consider

the

following

pair

of

observed

sequences:

Sequence1(st): A A T T G G C C A A T T G G C C ...Sequence2(xt): 1 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2 ...Positiont: 0 1 2 3 4 ...

whereweassumethatthepattern(highlightedwiththespaces)willcontinueforever. Letst {A , G , T , C }, t = 0,1,2, . . . denote the variables associated with the first sequence,and xt {1,2}, t = 0,1,2, . . . the variables characterizing the second sequence. So, forexample,giventhesequencesabove,theobservedvaluesforthesevariablesares0 =A, s1 =A, s2 =C , . . .,and,similarly,x0 = 1, x1 = 1, x2 = 2, . . ..

1. (4points)Ifweuseasimplefirstorderhomogeneousmarkovmodeltopredictthefirstsequence(valuesforst only),what isthemaximum likelihoodsolutionthatwewouldfind? Inthetransitiondiagrambelow,pleasedrawtherelevanttransitionsandtheassociatedprobabilities(thisshouldnotrequiremuchcalculation)

st

A

T

C

G

A

T

C

G

st1

. . . . . .


7/28/2019 final_f01

7/12

2. (T/F2points) TheresultingfirstorderMarkovmodelisergodic

3. (4points)To improvetheMarkovmodelabitwewould liketodefineagraphicalmodel that predicts the value of st on the basis of the previous observed valuesst1, st2, . . . (looking as far back as needed). The model parameters/structure areassumedtoremainthesameifweshiftthemodelonestep. Inotherwords,itisthesamegraphicalmodelthatpredictsst onthebasisofst1, st2, . . .asthemodelthatpredicts st1 on the basis of st2, st3, . . .. In the graph below, draw theminimumnumber of arrows that are needed to predict the first observed sequence perfectly(disregardingthefirstfewsymbols inthesequence). Sinceweslidethemodelalongthesequence,youcandrawthearrowsonlyforst.

st

st1

st2

st3

st4

. . . . . .

4. Now,toincorporatethesecondobservationsequence,wewilluseastandardhiddenMarkovmodel:

st

st1

st2

st3

st4

. . . . . .

. . . . . .

xt

xt1

xt2

xt3

xt4

where again st {A , G , T , C } and xt {1,2}. We will estimate the parameters ofthisHMMintwodifferentways.

(I) Treatthepairofobservedsequences(st, xt)(givenabove)ascompleteobservationsofthevariablesinthemodelandestimatetheparametersinthemaximumlikelihoodsense. TheinitialstatedistributionP0(s0)issetaccordingtotheoverallfrequencyofsymbolsinthefirstobservedsequence(uniform).

(II) Useonlythesecondobservedsequence(xt)inestimatingtheparameters,againinthemaximumlikelihoodsense. Theinitialstatedistributionisagainuniformacrossthefoursymbols.

Weassumethatbothestimationprocesseswillbesuccessfullrelativetotheircriteria.


7/28/2019 final_f01

8/12

a) (3 points) What are the observation probabilities P(x|s) (x {1,2}, s {A,G,T ,C})resultingfromthefirstestimationapproach? (shouldnotrequiremuchcalculation)

b) (3points)Whichestimationapproachislikelytoyieldamoreaccuratemodeloverthesecondobservedsequence(xt)? Brieflyexplainwhy.

5. ConsidernowthetwoHMMsresultingfromusingeachoftheestimationapproaches(approachesIandIIabove). TheseHMMsareestimatedonthebasisofthepairofobservedsequencesgivenabove. Wedliketoevaluatetheprobabilitythatthesetwomodels assignto a new (different) observationsequence 1 2 1 2, i.e., x0 = 1, x1 =2, x2 = 1, x3 =2. Forthefirstmodel, forwhichwehavesome ideaaboutwhatthest variableswillcapture,wealsowanttoknowthetheassociatedmostlikelyhiddenstatesequence. (theseshouldnotrequiremuchcalculation)

a) (2points)Whatistheprobabilitythatthefirstmodel(approachI)assignstothisnewsequenceofobservations?

b) (2points)What istheprobabilitythatthesecondmodel(approach II)givestothenewsequenceofobservations?


7/28/2019 final_f01

9/12

c) (2 points) What is the most likely hidden state sequence in the first model(fromapproachI)associatedwiththenewobservedsequence?

6. (4points)Finally, letsassumethatweobserveonlythesecondsequence(xt)(thesamesequenceasgivenabove). InbuildingagraphicalmodeloverthissequencewearenolongerlimitingourselvestoHMMs. However,weonlyconsidermodelswhosestructure/parametersremainthesameasweslidealongthesequence. Thevariablesst areincludedasbeforeastheymightcomehandyashiddenvariablesinpredictingthe

observed

sequence.

a) Inthefigurebelow,drawthearrowsthatanyreasonablemodelselectioncriterion

wouldfindgivenanunlimitedsupplyoftheobservedsequencext, xt+1, . . .. Youonly need to draw the arrows for the last pair of variables in the graphs, i.e.,(st,xt)).

st

st1

st2

st3

st4

. . . . . .

. . . . . .

xt

xt1

xt2

xt3

xt4

b) Givenonlyasmallnumberofobervations, themodelselectioncriterionmightselectadifferentmodel. Inthefigurebelow,indicateapossiblealternatemodelthat any reasonable model selection criterion would find given only a few examples. Youonlyneed todrawthearrows forthe lastpairofvariables inthegraphs,i.e.,(st, xt)).

st

st1

st2

st3

st4

. . . . . .

. . . . . .

xt

xt1

xt2

xt3

xt4


7/28/2019 final_f01

10/12

x1

x2

x3

x4

x5

x6

BA

Figure1: DecisionmakersAandBandtheircontextProblem5TheBayesiannetworkinfigure1claimstomodelhowtwopeople,callthemAandB,makedecisionsindifferentcontexts. Thecontextisspecifiedbythesettingofthebinarycontextvariablesx1, x2, . . . , x6. Thevaluesofthesevariablesarenotknownunlesswespecificallyaskforsuchvalues.

1. (6points)Weare interested infindingoutwhat informationwedhavetoacquireto ensure that A and B will make their decisions independently from one another.SpecifythesmallestsetofcontextvariableswhoseinstantiationwouldrenderAandBindependent. Brieflyexplainyourreasoning(theremaybemorethanonestrategyforarrivingatthesamedecision)

2. (T/F2points) Wecan ingeneralachieve independencewith lessinformation,i.e.,wedonthavetofullyinstantiatetheselectedcontextvariablesbutprovidesomeevidenceabouttheirvalues


7/28/2019 final_f01

11/12

3. (4 points) Could your choice of the minimal set of context variables change if wealsoprovidedyouwiththeactualprobabilityvaluesassociatedwiththedependenciesinthegraph? Provideabriefjustification.

Additionalsetoffigures

st

A

T

C

G

A

T

C

G

st1

. . . . . .

st

st1

st2

st3

st4

. . . . . .

st

st1

st2

st3

st4

. . . . . .

. . . . . .

xt

xt1

xt2

xt3

xt4


7/28/2019 final_f01

12/12

final_f01

Documents