rare feature selection in high dimensions › pdf › 1803.06675.pdf · ture selection when...
TRANSCRIPT
![Page 1: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/1.jpg)
Rare Feature Selection in High Dimensions
Xiaohan Yan∗ Jacob Bien†
July 10, 2020
Abstract
It is common in modern prediction problems for many predictor variables to becounts of rarely occurring events. This leads to design matrices in which many columnsare highly sparse. The challenge posed by such “rare features” has received little atten-tion despite its prevalence in diverse areas, ranging from natural language processing(e.g., rare words) to biology (e.g., rare species). We show, both theoretically and em-pirically, that not explicitly accounting for the rareness of features can greatly reducethe effectiveness of an analysis. We next propose a framework for aggregating rarefeatures into denser features in a flexible manner that creates better predictors of theresponse. Our strategy leverages side information in the form of a tree that encodesfeature similarity.
We apply our method to data from TripAdvisor, in which we predict the numericalrating of a hotel based on the text of the associated review. Our method achieveshigh accuracy by making effective use of rare words; by contrast, the lasso is unableto identify highly predictive words if they are too rare. A companion R package,called rare, implements our new estimator, using the alternating direction method ofmultipliers.
1 Introduction
The assumption of parameter sparsity plays an important simplifying role in high-dimensionalstatistics. However, this paper is focused on sparsity in the data itself, which actually makesestimation more challenging. In many modern prediction problems, the design matrix hasmany columns that are highly sparse. This arises when the features record the frequency ofevents (or the number of times certain properties hold). While a small number of these eventsmay be common, there is typically a very large number of rare events, which correspond tofeatures that are zero for nearly all observations. We call these predictors rare features. Rarefeatures are in fact extremely common in many modern data sets. For example, consider thetask of predicting user behavior based on past website visits: Only a small number of sitesare visited by a lot of the users; all other sites are visited by only a small proportion of users.
∗Data Scientist, Microsoft; email: [email protected]†Associate Professor, Data Sciences and Operations, Marshall School of Business, University of Southern
California; email: [email protected]
1
arX
iv:1
803.
0667
5v2
[st
at.M
E]
8 J
ul 2
020
![Page 2: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/2.jpg)
As another example, consider text mining, in which one makes predictions about documentsbased on the terms used. A typical approach is to create a document-term matrix in whicheach column encodes a term’s frequency across documents. In such domains, it is often thecase that the majority of the terms appear very infrequently across the documents; hencethe corresponding columns in the document-term matrix are very sparse (e.g., Forman 2003;Huang 2008; Liu et al. 2010; Wang et al. 2010). In Section 6, we study a text dataset withmore than 200 thousand reviews crawled from https://www.tripadvisor.com. Our goal isto use the adjectives in a review to predict a user’s numerical rating of a hotel. As shownin the right panel of Figure 7, the distribution of adjective density, defined as the propor-tion of documents containing an adjective, is extremely right-skewed, with many adjectivesoccurring very infrequently in the corpus. In fact, we find that more than 95% of the 7,787adjectives appear in less than 5% of the reviews. It is common practice to simply discardrare terms,1 which may mean removing most of the terms (e.g., Forman 2003; Huang 2008;Liu et al. 2010; Wang et al. 2010).
Rare features also arise in various scientific fields. For example, microbiome data measurethe abundances of a large number of microbial species in a given environment. Researchersuse next generation sequencing technologies, clustering these reads into “operational tax-onomic units” (OTUs), which are roughly thought of as different species of microbe (e.g.,Schloss et al. 2009; Caporaso et al. 2010). In practice, many OTUs are rare, and researchersoften aggregate the OTUs to genus or higher levels (e.g., Zhang et al. 2012; Chen et al.2013; Xia et al. 2013; Lin et al. 2014; Randolph et al. 2015; Shi et al. 2016; Cao et al. 2017)or with unsupervised clustering techniques (e.g. McMurdie and Holmes 2013; Wang andZhao 2017b) to create denser features. However, even after this step, a large portion ofthese aggregated OTUs are still found to be too sparse and thus are discarded (e.g., Zhanget al. 2012; Chen et al. 2013; Shi et al. 2016; Wang and Zhao 2017b). The rationale for thiselimination of rare OTUs is that there needs to be enough variation among samples for anOTU to be successfully estimated in a statistical model (Ridenhour et al., 2017).
The practice of discarding rare features is wasteful: a rare feature should not be inter-preted as an unimportant one since it can be highly predictive of the response. For instance,using the word “ghastly” in a hotel review delivers an obvious negative sentiment, but thisadjective appears very infrequently in TripAdvisor reviews. Discarding an informative wordlike “ghastly” simply because it is rare clearly seems inadvisable. To throw out over half ofone’s features is to ignore what may be a huge amount of useful information.
Even if rare features are not explicitly discarded, many existing variable selection methodsare unable to select them. The challenge is that with limited examples there is very littleinformation to identify a rare feature as important. Theorem 1 shows that even a single rarefeature can render ordinary least squares (OLS) inconsistent in the classical limit of infinitesample size and fixed dimension.
To address the challenge posed by rare features, we propose in this work a method forforming new aggregated features which are less sparse than the original ones and may bemore relevant to the prediction task. Consider the following features, which represent thefrequency of certain adjectives used in hotel reviews:
1For example, in the R text mining library tm (Feinerer and Hornik, 2017), removeSparseTerms is acommonly used function for removing any terms with sparsity level above a certain threshold.
2
![Page 3: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/3.jpg)
• Xworrying,Xdepressing, . . . ,Xtroubling,
• Xhorrid,Xhideous, . . . ,Xawful.
While both sets of adjectives express negative sentiments, the first set (which might besummarized as “worry”) seems more mild than the second set (which might be summarizedas “horrification”). In predicting the rating of a hotel review, we might find the followingtwo aggregated features more relevant:
Xworry = Xworrying +Xdepressing + · · ·+Xtroubling
Xhorrification = Xhorrid +Xhideous + · · ·+Xawful.
The distinction between “horrid” and “hideous” might not matter for predicting the hotelrating, whereas the distinction between a “worry”-related word versus a “horrification”-related word may be quite relevant. Thus, not only are these aggregated features less rarethan the original features, but they may also be more relevant to the prediction task. Amethod that selects the aggregated feature Xhorrification thereby can incorporate the infor-mation conveyed in the use of “hideous” into the prediction task; this same method may beunable to otherwise determine the effect of “hideous” by itself since it is too rare.
Indeed, appropriate aggregation of rare features in certain situations can be key to at-taining consistent estimation and support recovery. In Theorem 2, we consider a settingwhere all features are rare and a natural aggregation rule exists among the features. In thatsetting, we show that the lasso (Tibshirani, 1996) fails to attain high-probability supportrecovery (for all values of its tuning parameter), whereas an oracle-aggregator does attainthis property. Theorem 2 demonstrates the value of proper aggregation for accurate fea-ture selection when features are rare. This motivates the remainder of the paper, in whichwe devise a strategy for determining an effective feature aggregation based on data. Ouraggregation procedure makes use of side information about the features, which we find isavailable in many domains. In particular, we assume that a tree is available that representsthe closeness of features. For example, Figure 1 shows a tree for the previous word examplethat is generated via hierarchical clustering over GloVe (Pennington et al., 2014) embeddingslearned from a different data source. The two contours enclose two subtrees resulting froma cut at their joint node. Aggregating the counts in these subtrees leads to the new featuresXworry and Xhorrification described above. We give more details of constructing such a tree inSection 3.1.
In Section 2, we motivate our work by providing theoretical results demonstrating thedifficulty that OLS and the lasso have with rare features. We further show that correctaggregation of rare features leads to signed support recovery in a setting where the lassois unable to attain this property. In Section 3, we introduce a tree-based parametrizationstrategy that translates the feature aggregation problem to a sparse modeling problem. Ourmain proposal is an estimator formulated as a solution to a convex optimization problemfor which we derive an efficient algorithm. We draw connections between this method andrelated approaches and then in Section 4, provide a bound on the prediction error for ourmethod. Finally, we demonstrate the empirical merits of the proposed framework throughsimulation (Section 5) and through the TripAdvisor prediction task (Section 6) describedabove. In simulation, we examine our method’s robustness to misspecified side informa-
3
![Page 4: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/4.jpg)
expectablepickledcornedstocked
stuffedpackaged
rumalaskansalmon
lengthwisecoarse
uncookedavocado
shreddedchopped
peeledrotten
fatbottom
topstoppedtopping
frozendried
steamingwhipping
leftoverstale
boiledsour
hungrystarving
exhausteddehydrated
caringtending
distressedstricken
blinddeaf
disabledhomeless
elderlyaged
eighteenfifty
ninetycontracted
treatedill
pregnantsick
dyingconjoined
dominicanreturningdeparted
strandedrescued
modultraeponymous
hippunk
secrethidden
dirtyfake
animalpet
wildmad
coloredorange
redhighlandcrescent
midwesternrural
unpopulatedundeveloped
untouchedinaccessible
woodedhilly
sprawlingpopulated
scrubshallow
rockysandy
nauticalloco
sportivetropical
torrentialcloudy
dampwet
humidrainy
windychilly
weathercold
warmcool
hourlyannualyearly
grosstotal
minusprojected
surpassingwhopping
averageadjusted
ilxx
prepaidmailedpostal
misrepresentedinflatedskewed
unfoundedmisleading
falsedubious
questionableincomplete
inconsistentincorrect
sketchyobscure
aforementionedpeculiar
seemingconfusing
ambiguousproblematic
unsolicitedunauthorised
forfeitoutright
heardloud
yelledshouted
screamingcrying
blaringbanging
atticupstairs
downstairsawakeasleep
wakingawakened
sleepingbreathing
otcci
weewantan
bumcapitalist
orientatedlumbering
imperialisticethnic
foreigndomestic
thaiindonesian
koreanvietnamese
chinesemasculine
gayabusive
abusedharassed
communistpolitical
liberalpenal
mandatorystrict
forbiddenprohibited
violatedrestricted
limitingslight
downwarddim
fadeddashed
disappointingdismal
lacklustersubdued
upbeattepid
stockdipped
rosefelldropping
fallingrisenfallen
loweredslashed
sterlingrecoveringrecovered
lostbeat
beatenranked
fourthconsecutive
missedshot
tiedlone
minutefoul
franksometime
midlateago
fireddismissed
sentordered
sufficientlacking
minimalexcessive
excesswaste
rawscarce
worstblame
blamedstruggling
experiencedfaced
hurtbad
worsepoor
weakaffected
impactednegativefavorable
dangerousisolated
unstableelevated
relativelow
reducedbush
federalinternal
dueprevious
followingsubsequent
hearingincident
superiorpotent
formidableprobable
potentialapparent
seriousremote
neighboringcamp
nearnearby
scrambledrushed
rushwaiting
stoppedcollect
triedtrying
decidedwanted
attemptedforcedunable
remainingleft
removedabandoned
spreadfound
coveredexposed
offensivedefensive
uniformedloyal
armedfighting
proanti
opposedbacked
explosiveunidentified
searchingspotted
automaticwarning
flashkilling
stabbingexecuted
infamousnotorious
chargedguilty
suspectcriminalalleged
pettydrunken
drunkoperative
cutaneousfacialbone
minorsustained
sufferingsevere
bodilymental
fossilwarming
magnetichydraulic
ambientfluid
liquidsyntheticadditive
contentanalog
runnyitchy
scratchyedible
inediblefragrant
pottedrust
rustypowdery
mushygreasy
flakysticky
wateryhairy
hollowfuzzyeyed
buffreddish
bluishdarkening
opaquemurky
okayalright
deardamn
remissnappingseasick
pickybrag
bragginginquiring
discerningtripping
lingualbioluminescent
equidistantinterlacedschematic
derisiveaudible
buzzingwhispering
upwindbuggyhuffy
toedhooked
wiredplugged
fabgabby
littlerbedded
looseleaftropic
piedtrojan
iberiansmoking
secondhandaddictivealcoholic
gastrointestinalallergic
asthmaticsensitive
touchynegotiable
obligatoryoverdue
undecidedshy
inclinedworried
concernedconfused
unhappyfrustrated
hesitantreluctantunwilling
eageranxious
skepticalwary
advertisedbilled
bookedunavailable
resteddisposed
compensatedgraded
categorizedchecked
sortedfarthest
distantapproaching
wanderingsneaking
touchingbouncing
crossedhalfway
stucksideways
draggingsliding
drivenrollingrolledfloatingdrifting
tippedstray
discriminatorydiscriminating
intrusiveexplicit
unsafeimpractical
unusablepoorly
ineffectiveinadequate
consumingcostly
repetitiveneedless
unnecessarydiscouraged
apprehensiveleery
mumadamantdisturbed
horrifieddisgusted
appalleddismayed
hystericalterrified
distraughtstung
embarrassedannoyedirritated
botheredcomplaining
thunderstruckgobsmacked
flabbergastedspeechless
apologeticincredulous
bewilderedbaffled
perplexedpuzzled
unimpresseddisheartened
perturbedoffendeddisliked
dazzledenamored
pesteredpatronized
lazykindlywise
jollysly
insaneincompetent
unresponsiveselfish
arrogantnaive
ignorantuncaring
aloofforgiving
squeamishfussy
indulgentobnoxious
overbearingcocky
pushyuptight
surlysullen
cluelessgrumpy
lamerude
downrightnastyugly
stupidsillyweirdcrazy
inexperienceddisorganizeddisorganised
unimaginativeunmotivated
indifferentuninterested
noisyrowdy
unrulydisgruntled
iratemeagresizeable
exorbitantextortionate
overpricedtemptingenticing
limitlesslofty
manageableunmanageable
bearableintolerable
unbearableaggravating
compoundedstressful
pleasurabledraining
taxingspoilt
spoileddreary
miserablelousy
mediocrefeeble
patheticpitiful
hurriedfumbling
clumsyhaphazard
perfunctoryhalfhearted
wimpycrappy
touristygrouchy
crotchetysnootysnotty
smallishgrungyscruffy
constipatedthieving
grubbydodgy
tattytipsy
skankyslothful
bothersomenauseating
yuckylumpyuntidy
insubstantialunappealingunattractive
raunchytrashy
tastelessshameless
ostentatiousobtrusiveimpersonal
unsupportiveunromantic
uninvitingrepulsiverevoltingstereotyped
soullessglorified
decadentunsavory
harmlessinsignificant
imaginableserviceable
characterlessflavorless
existentunmentionable
borderlineatrocious
deplorableappalling
disgustingshameful
disgracefulinexcusable
repugnantundesirable
disadvantageousunhelpful
unfriendlyunprofessional
dishonestinconsiderate
impolitedisrespectful
mistakenignored
contraryjustified
beforehandrealised
understandableunavoidableunfortunate
solvedcorrected
rectifiedflush
crackcracked
interferingobligedobliging
revertingbowing
systematiccleansing
irregulartransient
endlesscountless
occasionalsporadic
evilinvisible
mysterioussheer
utterunbelievable
nonsenseoutrageous
ludicrousabsurd
ridiculousfrightening
scarybizarre
oddstrange
eeriechilling
ominousfuturistic
unrealworrying
depressingalarming
disturbingtroubling
horridhideousghastly
dreadfulhorrendous
horrifichorrifying
terriblehorrible
awfulinconvenient
uselesspointless
boringtedious
dullbland
unpleasantuncomfortable
awkwardunnerving
disconcertingannoyingirritating
abruptunexpected
suddenimmediate
swiftstark
bluntsharp
extremeharsh
draconiandrastic
humiliatingembarrassing
shockinghostile
unacceptableinappropriate
disappointedsurprised
shockedstunned
angeredangry
furiousanswering
repeatedechoing
nervoustired
boredweary
impatientsad
sorryashamed
upsettingupset
trickychallengingdifficult
toughfigured
wonderingnoticedseeing
afraidscared
wantingpretend
desperatevain
unluckytiring
frustratingpacked
crowdedjammed
filledempty
surroundedlined
foldingwooden
drapedbearing
picturedframed
stackedstrung
barefootlegged
barenaked
staringseatedsitting
dressedwearing
worndress
beigeskimpy
wrinkledsoiled
rearoverhead
upperouter
blankflip
diagonalsquared
narrowinggaping
umbrellamat
tieredroofed
mustymoldy
smellingputrid
fishyhokey
pretentioustacky
cheesyspooky
sneakygroovy
smokybouncy
mellowmelodious
boggywaterlogged
soggyfaultless
passablemoonlitdrizzly
comfycushy
stuffycrampedcluttered
swankyseedy
dankclaustrophobicfilthy
smellythreadbare
shabbydingygrimy
spartandrab
ramshackledecrepit
uphillgruellingtortuous
sleeplesshecticfrantic
overpoweringsuffocating
unrelentingblazing
leisurelybrisk
uneventfuleventful
unimpressiveiffy
unsatisfyingunannounced
attendantnonstop
understaffedlax
outdatedcumbersome
inefficientinoperable
operableunattended
unlockedventilated
moveablemovable
motorizedstationary
uncleansterile
sterilizedwatered
scrubbedpatched
hydratedtonic
soundproofhydroponic
crustedproofed
treeddeadened
sloppedthudding
vertiginousextraordinaire
manqueslippy
piscinebarefooted
offishlaxative
swishredux
spankingmaniac
repellentrepellant
especialcopiousfrayed
souredmarred
chaotictense
heatedsplit
dominateddestroyeddamaged
ruinedwrecked
shutshuttered
interruptedendedbroke
swepthammered
strucksuspended
cancelleddelayed
threatenedcontinuedrenewed
closeclosed
blockedsealed
controlledregulated
reservedvested
temporaryvacant
communalunoccupied
occupiedraining
splashedsoakedwashed
litteredscattered
burningburnt
burnedapart
brokenblown
thrownparked
smashedsmashing
preservedneglected
overlookedforgotten
sunkenmangled
gravedead
buriedtorn
wrappedwoundshady
crookedpaved
windingbumpy
dirtdusty
muddyrough
slipperyicy
faultyoverloaded
leakyspotty
shoddysubstandard
uneventight
slacksteep
flatnarrow
densethick
largeheavy
undergroundpedestrian
ragingroaring
rumblingcongested
cloggedflooded
overflowingprize
earnedreceived
millionworth
valuedstreaming
electronicdigital
instantonline
interactivebroadband
wirelessmobile
portableautomated
managerialtechnical
financialcorporate
profitablecompetitive
overseascommercial
directstandby
beneficiarytoken
supplementarysupplemental
genericprescription
unlimitedminimum
maximumpremium
rentalaffordable
expensiveinexpensive
cheapconvenientaccessible
suitableidealattractivedesirable
regentmeridian
intercontinentalmediterranean
continentalpacific
atlanticjet
planelanded
flyingfly
intrepidchartered
guidedtrackedsighted
expressoutbound
naturalmineral
orientaloccidental
electricalmechanical
motorelectric
discovereddetected
artificialsurface
equippedfitted
mountedpowered
builtdesigned
modelintegrateddeveloped
operationaloperating
dualfixed
existingconventional
advanceadvanced
increasedexpandedimproved
enhancedmaintained
limitedextended
insuredprudential
neutralpolar
netposted
levelhigh
higherraised
expectedanticipated
lowestoverall
combinedrevised
processedimported
availabledistributed
containedmaterial
documentedcollected
radionightly
dailyweekly
mediumtiny
smalllarger
smallersize
sizedvisible
characteristicdistinctive
uniquedistinct
obvioussurprising
noticeablesubtle
strikingdramatic
lessermyriad
assortedranging
variedvaryingtremendous
considerablegreater
substantialsignificant
massivevast
hugeenormous
widedeep
innerrich
diversemissing
aliveclearedheaded
loadeddriving
walkingcatching
runningshortlong
rightfree
middlehome
halfpast
turnedawayback
cutcutting
easyquick
slowfast
passpassing
forwardaheadbehind
pressedpressing
meetprepared
readymust
willingnoted
describedacknowledged
statedpointed
stressedlearned
understoodrealized
tellingadvised
awareinformed
proveddetermined
neitherconvinced
clearimpossibleotherwise
decidinglikely
unlikelypossible
futurechance
enoughable
stillkept
alonedonesure
goinggo
cominggone
realbig
biggerpretty
littlekind
likehard
excellentquality
bestgood
betterfairfine
protectedprotecting
cleansafe
securecritical
keyimportant
vitalcrucial
mainmajor
leadingactive
involvedresponsible
becomingtransformed
assumedassuming
presentcurrent
merezero
comparableequivalent
changedchanging
similarunusual
subjectlatter
unlikenormal
typicalusual
everyentirewhole
supposedintended
requiredneeded
necessaryused
appliedconnected
separatemultiple
individualrelated
commondifferent
specificparticular
certainmedicalpatient
elementarytextbookbilingual
experimentalmodern
contemporaryalternative
choicepreferred
basicstandard
traditionalcustom
residenthired
retiredtrained
professionalamateur
eligiblequalified
prospectiveselected
selectassigned
ranktop
rankingjunior
eliterecognised
respectedforemost
outstandingexemplary
creativeartistic
worthyultimate
extraordinaryexceptional
enduringgenuine
famouslegendary
greatgreatest
honoreddedicated
absoluteuniversal
ordinaryeveryday
puresatisfying
simplestraightforward
selfconscious
personalphysical
hispanicamerican
nativelatin
asianafrican
sidecross
oppositeoutside
insidefront
standingpriorinitial
lastearlier
earlylater
completedset
fullcomplete
historicmarked
celebratedconverted
settledrelocated
incorporatedestablished
liveliving
newnext
beginningstarting
movedmoving
workingspent
lessfar
evenmuch
wellmade
justone
anothertravelledtraveled
attendedgathered
fewerleastnone
eightnineseven
fivesix
twofour
threetwenty
thirtyten
twelvefifteeneleven
dozenhundred
dividedtogether
arrangedjewish
affiliatedformedjoined
nationwidecoordinated
organisedorganized
saidaccording
officialconfirmed
reportednational
europeaninternational
unionunited
appointedelected
representativerepresented
seniorassociateassistant
executivegeneral
scheduledplanned
augustopened
heldsign
signedapproved
agreedannounced
jointinitiative
presidentialpremier
primeindian
canadianbritish
australianenglish
irishroyal
colonialvictorian
newbornteenage
teenagedadult
femalemale
youngolder
youngerold
bornmarried
mexicancuban
italianbrazilian
spanishportuguese
japanesegreek
turkishgiant
ownedsoldbased
firmboss
exformer
swissfrench
belgiandutch
danishgerman
polishii
romanottomanprussian
metropolitanindustrial
residentialurban
capitalnortheastsoutheast
northsouth
westeast
centralsouthernnorthern
westerneastern
terminalport
baybase
busybustling
downtownuptown
plainbordered
overlookingsurrounding
adjacentsquare
locatedsituated
peacefulmass
expressedvoiced
forthcomingupcoming
recentlatestongoing
continuingcivic
orientedminded
friendlycooperative
understandingsharedsharing
diplomaticstrategic
historicalcultural
manyvarious
severalnumerous
popularknown
consideredlocal
privatepublic
independentlegal
centercomplex
humansocial
environmentalloose
thinsoft
smoothtransparent
relaxingrelaxed
accommodatingconditioned
festiveinviting
welcomingwelcome
customaryroutine
casualintimate
correctwrong
acceptablereasonable
sensibleproper
appropriatepracticalobjective
whatsoevertrue
meaningregardless
meanwhateverknowing
proofdefinite
gearedmotivated
engagedengaging
accustomedalike
lookingthinking
interestedfocused
reliableaccurate
carefulthorough
timelypromptspeedy
intuitiveagileeconomical
suitedutilized
aptpreferable
knowledgeablereputable
responsivehonest
competentgrowing
grownfresh
mixedmild
moderatepositive
strongconsistent
solidcautious
modeststeady
overnightquietcalm
signalalert
elaboratedetailed
extensivebrief
lengthyformal
informalfrequentconstant
plusextra
additionalregularspecial
accompaniedaccompanying
placedattached
summarysupreme
unanimoushandled
addressedresolved
fulfilledrelieved
assuredconfident
pleasedsatisfied
empoweredsupervised
assistedpromisingappealing
demandingacceptedaccepting
reversereversed
drawnhanded
takingtaken
givengiving
exclusivesole
grantedrequestedauthorized
permanentpartial
datedlisted
unknownexact
actualverified
confirmingdisclosed
topnotchmoderne
metaliron
concretestone
votiveilluminated
litlighted
carvedsculpted
painteddecorated
adornedhandmade
antiquestainedpastel
archarched
tiledglazed
paneledornatedomed
immaculatecathedral
outwardinlandfarther
roundedsheltered
twintall
laidlay
nestledtucked
shapedetched
sacredmythical
heavenlydivine
marianeldernoble
hollywoodstarswank
rubyshining
stellarglowing
faintbygone
bustgargantuan
giganticpreciouspriceless
rareexotic
rusticquaint
picturesquesecluded
idyllicposh
upscaleupmarket
parisianchic
fashionabletrendy
plushluxe
palatialluxurious
opulentsumptuous
scenicpanoramic
sunsetsunrise
pleasantbreezysunny
gothicbaroque
architecturaldecorative
interiorexterior
downhillquadruple
doubletriple
swimmingindoor
outdoorworld
olympicgolden
goldsilver
wonwinning
roundfinal
finishedstraight
opengrand
instrumentalsolopianobass
soundfamiliar
soundingsinging
popmaster
magicmagical
twentiethinspired
reminiscentmirrored
reflectedevident
romanticromance
adaptednoveldirecteddirecting
entitledwritten
originalincludedfeatured
strawoval
identicalmatchingmatched
uniformfit
fittingcolor
lightdark
brightblackwhite
browngray
greyyellow
pinkblue
greencurt
pitchedplayedgame
singlefirst
secondthird
eleventhninth
fifthsixth
seventhcapped
timedopeningclosing
stretchmidway
amicablecordial
utmostsincere
lovingrighteous
virtuousconscientious
compassionatehearty
lukewarmgracious
hospitablerespectful
courteouspolite
diligentattentive
solicitousconsiderate
sanitaryblanket
protectivebotanical
landscapedleafy
shadedaerobic
drinkableusable
plentifulabundant
adequateample
satisfactoryworkable
painlesstolerable
worthwhiledoable
inclusiveconducive
complementarybeneficial
flexibleeffectiveefficient
ccstereo
deluxecompact
cdadjustable
trimoptionalgroomed
wededucated
wealthyhealthy
healthiervibrant
cosmopolitantolerantcivilized
windwardvirgin
coralcanary
tuscanvenetian
neapolitanflorentine
jamaicanhawaiian
creolenigh
galorepresto
spellboundpricy
homelyscrumptious
appetizingalfresco
pamperinguncrowdedunclutteredundisturbed
sparserestful
sedateunobtrusive
discreetserenetranquil
joyfulcheerfulcheery
encyclopedicboundless
unparalleledunsurpassed
unmatchedunbeatable
flawlessunderstated
effortlessimpeccablesuperlative
fluentspeaking
spokencapablepowerful
aggressivecounter
smartclever
sophisticatedinnovative
slimtame
marginalmeager
heftysizable
valuableinvaluable
handyhelpfuluseful
sparewasted
savedsaving
generouspaid
payingbravebold
fabulousfantastic
wonderfulterrific
phenomenalastonishing
amazingincredible
awesomemarvelousmarvellous
gloriousmajestic
magnificentsplendid
spectacularstunningbreathtaking
dazzlingsuperbbrilliant
impressiveremarkable
deserveddecent
respectablehappy
okcomfortable
niceperfect
seasonedaccomplished
talentedappreciativeenthusiastic
impressedappreciated
fanciedentertained
adoredkeenavid
curiousexcited
amazeddelighted
thrilledlovedliked
fortunatelucky
gratefulglad
thankfulhotdry
saltbaking
cerealcrackers
peanutfluffy
vanillacaramel
blendedpowdered
nutstoasted
steamedroast
chickenfried
bakedgrilled
cookedroasted
olivemint
cherryhoney
ripejuicy
tendersweet
saltysavory
tastydelicious
darlingpat
acefavourite
favoriteoutback
supermini
classichomemade
takeoutcomplimentary
personalizedinflatable
plasticrubber
disposablecushioned
waterproofremovable
adagioyummy
velvetindigo
frostedquilted
toplessnude
laughingsmiling
petitepolished
exquisiteelegantgraceful
crispsparkling
bubblylush
pristinecolorful
beautifulgorgeous
lovelytanned
refreshedspotlesssqueaky
roomysnug
cozycosy
glossystylishsleek
flashyglamorous
priceyfancy
skinnyoversized
shinysparkly
finerflowerylegible
airyexpansive
enclosedfurnishedspacious
interconnectedaugmentedconfigured
ellipticalcircular
rectangulardiscrete
functionalcorresponding
definedpreset
desiredoptimal
approximateinferior
femininefunny
humorousironic
amusinghilarious
thoughtfulpassionate
energeticlively
spiritedmemorable
unforgettableexciting
fascinatinginterestingintriguing
informativeentertaining
enjoyableauthentictimeless
fashionedquintessential
funkyretro
eclecticminimalist
pleasinginspiring
alluringenchanting
mesmerizingcharmingdelightful
quirkyendearing
earthysoothing
invigoratingrefreshing
wholesomehomey
tastefulclassy
creepycute
adorablechatty
personableapproachable
manneredtalkative
grittyneattidy
humbleunassuming
unpretentious
0 2 4 6 8 10 12 14
Cluster D
endrogram
as.hclust.dendrogram (*, "N
A")dend
Height
Figure 1: A tree that relates adjectives on its leaves
tion. Quantitative and qualitative comparisons in the TripAdvisor example highlight theadvantages of aggregating rare features.
Notation: Given a design matrix X ∈ Rn×p, let xi ∈ Rp denote the feature vector ofobservation i and Xj ∈ Rn denote the jth feature, where i = 1, . . . , n and j = 1, . . . , p.For a vector β ∈ Rp, let supp(β) ⊆ 1, . . . , p denote its support (i.e., the set of indices ofnonzero elements). Let S±(β) := (sign(β`))`=1,...,p encode the signed support of the vectorβ. Let T be a p-leafed tree with root r, set of leaves L(T ) = 1, . . . , p, and set of nodesV(T ) of size |T |. Let Tu be the subtree of T rooted by u for u ∈ V(T ). We follow thecommonly-used notions of child, parent, sibling, descendant, and ancestor to describe therelationships between nodes of a tree. For a matrix A ∈ Rm×n and subset B of 1, . . . , n,let AB ∈ Rm×|B| be the submatrix formed by removing the columns of A not in B. LetS(β`, λ) = sign(β`) ·max|β`|−λ, 0 be the soft-thresholding operator applied to β` ∈ R. Welet ej denote the vector having a one in the jth entry and zero elsewhere. For sequences anand bn, we write an . bn to mean that an/bn is eventually bounded, and we write an = o(bn)to mean that an/bn converges to zero.
2 Rare Features and the Promise of Aggregation
2.1 The Difficulty Posed by Rare Features
Consider the linear model,
y = Xβ∗ + ε, ε ∼ N(0, σ2In). (1)
where y = (y1, . . . , yn)T ∈ Rn is a response vector, X ∈ Rn×p is a design matrix, β∗ isa p-vector of parameters, and ε ∈ Rn is a vector of independent Gaussian errors havingvariance σ2. In this paper, we focus on counts data, i.e., Xij records the frequency of anevent j in observation i. In particular, we will assume throughout that X has non-negativeelements.
The lasso (Tibshirani, 1996) is an estimator that performs variable selection, making itwell-suited to the p n setting:
βlassoλ ∈ arg minβ∈Rp
1
2n‖y −Xβ‖2
2 + λ‖β‖1. (2)
4
![Page 5: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/5.jpg)
When λ = 0, this coincides with the OLS estimator, which is uniquely defined when n > pand X is full rank:
βOLS(n) = (XTX)−1XTy.
To better understand the challenge posed by rare features, we begin by considering the effectof a single rare feature on OLS in the classical p-fixed, n → ∞ regime. We take the jthfeature to be a binary vector having k nonzeros, where k is a fixed value not depending onn. As n increases, the proportion of nonzero elements, k/n, goes to 0. We show in Theorem1 that βOLS
j (n) does not converge in probability to β∗j with increasing sample size. Thisestablishes that OLS is not a consistent estimator of β∗ even in a p-fixed asymptotic regime.
Theorem 1. Consider the linear model (1) with X ∈ Rn×p having full column rank. Furthersuppose that Xj is a binary vector having (a constant) k nonzeros. It follows that there existsη > 0 for which
lim infn→∞
P(∣∣∣βOLS
j (n)− β∗j∣∣∣ > η
)> 0.
Proof. The result follows from taking lim infn→∞ of both sides of (7) in Appendix A andobserving that 2Φ
(−ηk1/2/σ
)does not depend on n.
The above result highlights the difficulty of estimating the coefficient of a rare feature.This suggests that even when rare features are not explicitly discarded, variable selectionmethods may fail to ever select them regardless of their strength of association with theresponse. Other researchers have also acknowledged the difficulty posed by rare features indifferent scenarios. For example, in the context of hypothesis testing for high-dimensionalsparse binary regression, Mukherjee et al. (2015) shows that when the design matrix is toosparse, any test has no power asymptotically, and signals cannot be detected regardless oftheir strength. Since the failure is caused by the sparsity of the features, it is thereforenatural to ask if “densifying the features” in an appropriate way would fix the problem. Asdiscussed above, aggregating the counts of related events may be a reasonable way to allowa method to make use of the information in rare features.
2.2 Aggregating Rare Features Can Help
Given m subsets of 1, . . . , p, we can form m aggregated features by summing within eachsubset. We can encode these subsets in a binary matrix A ∈ 0, 1p×m and form a new
design matrix of aggregated features as X = XA. The columns of X are also counts, butrepresent the frequency of m different unions of the p original events. For example, if thefirst subset is 1, 6, 8, the first column of A would be e1 + e6 + e8 and the first aggregated
feature would be X1 = X1 + X6 + X8, recording the number of times any of the first,sixth, or eighth events occur. A linear model, Xβ, based on the aggregated features can beequivalently expressed as a linear model, Xβ, in terms of the original features as long as βsatisfies a set of linear constraints (ensuring that it is in the column space of A):
Xβ = (XA)β = X(Aβ) = Xβ.
5
![Page 6: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/6.jpg)
The vector β lies in the column space of A precisely when it is constant within each of them subsets. For example,
enforcing β1 = β6 = β8 ⇔ aggregating features: X1β1+X6β6+X8β8 = (X1+X6+X8)β1 = X1β1.(3)
In practice, determining how to aggregate features is a challenging problem, and our proposedstrategy in Section 3 will use side information to guide this aggregation.
For now, to understand the potential gains achievable by aggregation, we consider anidealized case in which the correct aggregation of features is given to us by an oracle. In thenext theorem, we construct a situation in which (a) the lasso on the original rare features isunable to correctly recover the support of β∗ for any value of the tuning parameter λ, and(b) an oracle-aggregation of features makes it possible for the lasso to recover the supportof β∗. For simplicity, we take X as a binary matrix, which corresponds to the case inwhich every feature has n/p nonzero observations. We take β∗ to have k blocks of sizep/k, with entries that are constant within each block. The last block is all zeros and theminimal nonzero |β∗j | is restricted to lie within a range that is expands with n, p and k. Theoracle approach delivers to the lasso the k aggregated features that match the structure inβ∗. These aggregated features have n/k nonzeros, and thus are not rare features. Havingpeformed the lasso on these aggregated features, we then duplicate the k elements, p/k timesper group, to get βoracleλ ∈ Rp. The lasso with the oracle-aggregator is shown to achieve high-probability signed support recovery whereas the lasso on the original features fails to achievethis property for all values of the tuning parameter λ.
Theorem 2. Consider the linear model (1) with binary matrix X ∈ 0, 1n×p, p ≤ n,and XTX = (n/p)Ip: every column of X has n/p one’s and X1p = 1n. Suppose β∗ =β∗ ⊗ 1p/k for β∗ = (β∗1 , . . . , β
∗k−1, 0). Suppose k < p/(36 log n). Then, the interval I =
(σ√
4kn
log(k2n), σ√
p3n
log (2cp/k)] with c = 13e(π/2+2)−1
√14
+ 1π
is nonempty and for mini=1,...,k−1
∣∣∣β∗i ∣∣∣ ∈I, the following two statements hold:
(a) The lasso fails to get high-probability signed support recovery:
lim supp→∞
supλ≥0
P(S±(βlassoλ ) = S±(β∗)
)≤ 1
e.
(b) The lasso with an oracle-aggregation of features succeeds in recovering the correct signedsupport for some λ > 0:
limn→∞
P(S±(βoracleλ ) = S±(β∗)
)= 1.
Proof. See Appendix B.
Even when the true model does not have a small number of aggregated features (i.e., β∗
does not have k p distinct values), it may still be beneficial to aggregate. The next resultexhibits a bias-variance tradeoff for feature aggregation.
6
![Page 7: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/7.jpg)
Theorem 3 (Bias-Variance Tradeoff for Feature Aggregated Least Squares). Consider thelinear model (1) with X having full column rank (n > p) and a general vector β∗ ∈ Rp.
Let C = C1, . . . , C|C| be an arbitrary partition of 1, . . . , p. Let βC ∈ Rp be the estimatorformed by performing least squares subject to the constraint that parameters are constantwithin the groups defined by C. Then, the following mean squared estimation result holds:
1
nE‖XβC −Xβ∗‖2 =
1
n‖X‖2
op
|C|∑`=1
∑j∈C`
(β∗j − |C`|−1
∑j′∈C`
β∗j′
)2
+σ2|C|n
Proof. See Appendix C.
The bias term is small when there is a small amount of variability in coefficient valueswithin groups of the partition C, i.e. when β∗ is approximately constant within each group.Even when β∗ in truth has a large k (even k = p), there may still exist a partition C with
|C| k for which the bias term is small and thus E‖XβC−Xβ∗‖2 E‖XβOLS−Xβ∗‖2 =σ2p.
3 Main Proposal: Tree-Guided Aggregation
In the previous section, we have seen the potential gains achievable through aggregating rarefeatures. In this section, we propose a tree-guided method for aggregating and selecting rarefeatures. We discuss this tree in Section 3.1, introduce a tree-based parametrization strategyin Section 3.2, and propose a new estimator in Section 3.3.
3.1 A Tree to Guide Aggregation
To form aggregated variables, it is infeasible to consider all possible partitions of the features1, . . . , p. Rather, we will consider a tree T with leaves 1, . . . , p and restrict ourselves topartitions that can be expressed as a collection of branches of T (see, e.g., Figure 1). Wesum features within a branch to form our new aggregated features.
We would like to aggregate features that are related, and thus we would like to have Tencode feature similarity information. Such information about the features comes from priorknowledge and/or data sources external to the current regression problem (i.e., not fromy and X). For example, for microbiome data, T could be the phylogenetic tree encodingevolutionary relationships among the OTUs (e.g., Matsen et al. 2010; Tang et al. 2016;Wang and Zhao 2017a) or the co-occurrence of OTUs from past data sets. When featurescorrespond to words, closeness in meaning can be used to form T (e.g., in Section 6, weperform hierarchical clustering on word embeddings that were learned from an enormouscorpus).
In (3), we demonstrated how aggregating a set of features is equivalent to setting thesefeatures’ coefficients to be equal. To perform tree-guided aggregation, we therefore associatea coefficient βj with each leaf of T and “fuse” (i.e., set equal to each other) any coefficientswithin a branch that we wish to aggregate.
7
![Page 8: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/8.jpg)
γ8
γ6
γ1 γ2 γ3
γ7
γ4 γ5
β1 β2 β3 β4 β5
γ8
γ6
γ1 γ2 γ3
γ7
γ4 γ5
β1 β2 β3 β4 β5
Figure 2: (Left) An example of β ∈ R5 and T that relates the corresponding five features.By (4), we have βi = γi + γ6 + γ8 for i = 1, 2, 3 and βj = γj + γ7 + γ8 for j = 4, 5. (Right)By zeroing out the γi’s in the gray nodes, we aggregate β into two groups indicated by thedashed contours: β1 = β2 = β3 = γ6 + γ8 and β4 = β5 = γ8. Counts data are aggregated forfeatures sharing the same coefficient: Xβ = (X1 +X2 +X3)β1 + (X4 +X5)β4.
3.2 A Tree-Based Parametrization
In order to fuse βj’s within a branch, we adopt a tree-based parametrization by assigning aparameter γu to each node u in T (this includes both leaves and interior nodes). The leftpanel of Figure 2 gives an example. Let ancestor(j) ∪ j be the set of nodes in the pathfrom the root of T to the jth feature, which is associated with the jth leaf. We express βjas the sum of all the γu’s on the path:
βj =∑
u∈ancestor(j)∪j
γu. (4)
This can be written more compactly as β = Aγ, where A ∈ 0, 1p×|T | is a binary matrixwith Ajk := 1uk∈ancestor(j)∪j = 1j∈descendant(uk)∪uk. The descendants of each node udefine a branch Tu, and zeroing out γv’s for all v ∈ descendant(u) fuses the coefficients inthis branch, i.e., βj : j ∈ L(Tu). Thus, γdescendant(u) = 0 is equivalent to aggregating thefeatures Xj with j ∈ L(Tu) (see the right panel of Figure 2).
Another way of viewing this parametrization’s merging of branches is by expressingXβ =XAγ, where (XA)ik =
∑pj=1 XijAjk =
∑j:j∈descendant(uk)∪ukXij aggregates counts over all
the descendant features of node uk. By aggregating nearby features, we allow rare features toborrow strength from their neighbors, allowing us to estimate shared coefficient values thatwould otherwise be too difficult to estimate. In the next section, we describe an optimizationproblem that uses the γ parametrization to simultaneously perform feature aggregation andselection.
3.3 The Optimization Problem
Our proposed estimator β is the solution to the following convex optimization problem:
minβ∈Rp,γ∈R|T |
1
2n‖y −Xβ‖2
2 + λ
α |T |∑`=1
w` |γ`|+ (1− α)
p∑j=1
wj |βj|
s.t. β = Aγ
.
(5)
8
![Page 9: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/9.jpg)
We apply a weighted `1 penalty to induce sparsity in γ, which in turn induces fusion ofthe coefficients in β. In the high-dimensional setting, sparsity in feature coefficients is alsodesirable, so we also apply a weighted `1 penalty on β. The tuning parameter λ controlsthe overall penalization level while α determines the trade-off between the two types ofregularization: fusion and sparsity. In practice, both λ and α are determined via crossvalidation. The choice of weights is left to the user. Our theoretical results (Section 4)suggest a particular choice of weights, although in our empirical studies (Sections 5 and 6)we simply take all weights to be 1 except for the root, which we leave unpenalized (wroot = 0).Choosing wroot = 0 allows one to apply strong shrinkage of all coefficients towards a commonvalue other than zero.
When α = 0, (5) reduces to a lasso problem in β; when α = 1, (5) reduces to a lassoproblem in γ. Both extreme cases can be efficiently solved with a lasso solver such as glmnet(Friedman et al., 2010). For α ∈ (0, 1), (5) is a generalized lasso problem (Tibshirani andTaylor, 2011) in γ, and can be solved in principle using preexisting solvers (e.g., Arnoldand Tibshirani 2014). However, better computational performance, in particular in high-dimensional settings, can be attained using an algorithm specially tailored to our problem.With weights wr = 0 and w` = 1, wj = 1`6=r,j∈[1,p], we write (5) as a global consensusproblem and solve this using alternating direction method of multipliers (ADMM, Boyd et al.(2011)). The consensus problem introduces additional copies of β and γ, which decouplesthe various parts of the problem, leading to efficient ADMM updates:
minβ(1),β(2),β(3),β∈Rpand γ(1),γ(2),γ∈R|T |
1
2n
∥∥y −Xβ(1)∥∥2
2+ λ
α |T |∑`=1
w`
∣∣∣γ(1)`
∣∣∣+ (1− α)
p∑j=1
wj
∣∣∣β(2)j
∣∣∣ (6)
s.t. β(3) = Aγ(2),β = β(1) = β(2) = β(3) and γ = γ(1) = γ(2).
In particular, our ADMM approach requires performing a singular value decomposition(SVD) on X, an SVD on (Ip : −A) (these are reused for all λ and α), and then applyingmatrix multiplies and soft-thresholdings until convergence. See Algorithm 1 in Appendix Dfor details. Appendix D.1 provides a derivation of Algorithm 1 and Appendix D.2 discussesa slight modification for including an intercept, which is desirable in practice.
3.4 Connections to Other Work
Before proceeding, we draw some connections to other work. Wang and Zhao (2017b) in-troduce a penalized regression method with high-dimensional and compositional covariatesthat uses a phylogenetic tree. They apply an `1 penalty on the sum of coefficients withina subtree, for every possible subtree in the phylogenetic tree. Their penalty is designed toencourage the sum of coefficients within a subtree to be zero, which naturally detects a sub-composition of microbiome features. By contrast, our method applies an `1 penalty on ourlatent variables γu’s, which induces the regression coefficients within subtrees to have equalvalues. Thus, their penalty encourages the sum of coefficients within a subtree to be zero,whereas ours induces equality. For a simple example, suppose β1 and β2 form a subtree ofthe phylogenetic tree. Their penalty would promote β1 = −β2 whereas ours would promoteβ1 = β2. The basic assumption of their method is that contrasts of species abundances within
9
![Page 10: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/10.jpg)
a subtree may be predictive, whereas the basic assumption of our method is that averagespecies abundance within a subtree may be predictive. These different structural assump-tions will be appropriate in different situations. In this paper’s context of rare features, ourassumption is the relevant one: Consider a subtree containing a large number of species. Byasking for equality of coefficients, our method promotes the aggregation of species countsacross the subtree, leading to denser features; by contrast, asking for coefficients to sum tozero does not address the problem of feature rarity.
Existing work has also considered the setting in which regression coefficients are thoughtto be clustered into a small number of groups of equal coefficient value. For example, She(2010) and Ke et al. (2015) do this by penalizing coefficient differences. Neither method,however, is focused specifically on rare features and thus they do not rely on side informationto perform this clustering of coefficients. In our setting, the side information provided by thetree plays an important role in compensating for the extremely small amount of informationavailable about rare features.
Several other methods assume a relevant undirected graph over the predictors and use agraph-Laplacian or graph-total-variation penalty to promote equality of coefficients that arenearby on the graph (Li and Li, 2010; Li et al., 2018). Depending on the setting, this graphmay either be pure side information (Li and Li, 2010) or be a covariance graph estimatedbased on X itself (Li et al., 2018). While the above methods use graph information “edge-by-edge”, Yu and Liu (2016) incorporate graphical information “node-by-node” to promotejoint selection of predictors that are connected on the graph.
Guinot et al. (2017) considers a similar idea of aggregating genomic features with the helpof a hierarchical clustering tree; however, in the tree is learned from the design matrix and theprediction task is only used to determine the level of tree cut, whereas our method in effectuses the response to flexibly choose differing aggregation levels across the tree. We considera strategy similar to theirs, which we call L1-ag-h in the empirical comparisons. Kim et al.(2012) propose a tree-guided group lasso approach in the context of multi-response regression.In their context, the tree relates the different responses and is used to borrow strength acrossrelated prediction tasks. Zhai et al. (2018) propose a variance component selection schemethat aggregates OTUs to clusters at higher phylogenetic level, and treats the aggregatedtaxonomic clusters as multiple random effects in a variance component model. Finally,Khabbazian et al. (2016) propose a phylogenetic lasso method to study trait evolution fromcomparative data and detect past changes in the expected mean trait values.
4 Statistical Theory
In this section, we study the prediction consistency of our method. Since T encodes featuresimilarity information, throughout the section we require T to be a “full” tree such that eachnode is either a leaf or possesses at least two child nodes. We begin with some definitions.
Definition 1. We say that B ⊆ V(T ) is an aggregating set with respect to T if L(Tu) :u ∈ B forms a partition of L(T ).
The black circles in Figure 3 form an aggregating set since their branches’ leaves are apartition of 1, . . . , 8. We would like to refer to “the true aggregating set B∗ with respect
10
![Page 11: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/11.jpg)
u1
u2 u3
u4
u5
1.3 1.3 1.3 3.5 4.7 0 0 3.5β∗1 β∗2 β∗3 β∗4 β∗5 β∗6 β∗7 β∗8
Figure 3: In the above tree, B∗ = u1, u2, u3, u4, y5 has its nodes labeled with black circles.
to T ” and, to do so, we must first establish that there exists a unique coarsest aggregatingset corresponding to a vector β∗.
Lemma 1. For any β∗ ∈ Rp, there exists a unique coarsest aggregating set B∗ := B(β∗, T ) ⊆V(T ) (hereafter “the aggregating set”) with respect to the tree T such that (a) β∗j = β∗k forj, k ∈ L(Tu) ∀u ∈ B∗, (b) |β∗j − β∗k| > 0 for j ∈ L(Tu) and k ∈ L(Tv) for siblings u, v ∈ B∗.
The lemma (proved in Appendix E) defines B∗ as the aggregating set such that furthermerging of siblings would mean that β∗ is not constant within each subset of the partition.
Definition 2. Given the triplet (T ,β∗,X), we define (a) X = XAB∗ ∈ Rn×|B∗| to be thedesign matrix of aggregated features, which uses B∗ = B(β∗, T ) as the aggregating set, and(b) β∗ ∈ R|B∗| to be the coefficient vector using these aggregated features: β∗ = AB∗β
∗.
We are now ready to provide a bound on the prediction error of our estimator, which isproved in Appendix F.
Theorem 4 (Prediction Error Bound). If we take λ ≥ 4σ√
log(2p)/n, wj = ‖Xj‖2 /√n for
1 ≤ j ≤ p and w` = ‖XA`‖2 /√n for 1 ≤ ` ≤ |T |, then
1
n
∥∥∥Xβ −Xβ∗∥∥∥2
2≤ 3λ
((1− α)
∑j∈A∗
wj|β∗j |+ α∑`∈B∗
w`|β∗` |
)
holds with probability at least 1− 2/p for any α ∈ [0, 1].
This is a slow rate bound for our method. The standard slow rate bound for the lassois σ
√log p/n‖β∗‖1. The next corollary establishes that our method, for any choice of α,
achieves this rate.
Corollary 1. Suppose ‖Xj‖2 ≤√n for 1 ≤ j ≤ p. Then, taking λ = 4σ
√log(2p)/n and
using the weights in Theorem 4,
1
n‖X(β − β∗)‖2
2 . σ√
log p/n‖β∗‖1
holds with probablity at least 1− 2/p for any α ∈ [0, 1].
11
![Page 12: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/12.jpg)
Proof. See Appendix G.
The previous corollary establishes that one does not worsen the rate by using our methodover the lasso in a generic setting. But is there an advantage to using our method? Ourmethod is designed to do well in circumstances in which |B∗| is small, that is β∗ is mostlyconstant with grouping given by the provided tree. The next corollary considers the extremecase in which β∗ is constant (and nonzero).
Corollary 2. Under the conditions of Corollary 1, suppose β∗1 = . . . = β∗p 6= 0. Taking
α ≥ [1 + ‖X1p‖2/(p√n)]−1
,
1
n
∥∥∥Xβ −Xβ∗∥∥∥2
2. σ
√log p
n‖β∗‖1 ·
‖X1p‖2
p√n
holds with probability at least 1 − 2/p. Thus, this improves the lasso rate when ‖X1p‖2 =o(p√n).
Proof. See Appendix H.
For certain sparse designs, the above condition holds. For example, when n = p andX =
√nIn, ‖X1p‖2 =
√n‖1n‖2 = n which is o(p
√n) = o(n3/2). The next proposition
considers a more general sparse X scenario.
Proposition 1. Suppose each column of X has exactly r nonzero entries chosen at randomand independently of all other columns. Suppose all nonzero entries equal
√n/r so that
‖Xj‖2 =√n for every column 1 ≤ j ≤ p. If 8 log(n+1)
3p< r
n→ 0, then ‖X1p‖2/(p
√n)→ 0 in
probability.
Proof. See Appendix I.
Proposition 1 combined with Corollary 2 demonstrates that our method can improve theprediction error rate over the lasso. While this corollary focuses on the rather extreme casein which all coefficients are equal, we expect the result to be generalizable to a wider classof settings. Indeed, the empirical results of the next section suggest that our method canoutperform the lasso in many settings.
5 Simulation Study
We start by forming a tree T with p leaves. To do so, we generate p latent points in Rand then apply hierarchical clustering (using hclust in R Core Team (2016) with completelinkage). We would like the tree to partition the leaves into k clusters of varying sizes and atdiffering heights in the tree (which will correspond to the true aggregation of the features).To do so, we first generate k cluster means µ1, . . . , µk ∈ R with µi = 1/i. The first k/2means have 3p/(2k) associated latent points each, and the remaining k/2 means have p/(2k)associated latent points each. The latent points associated with µi are drawn independentlyfrom N (µi, τ
2 minj(µi − µj)2), where τ = 0.05.
12
![Page 13: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/13.jpg)
Are featuresaggregated?
L1: lasso
L1-dense: lasso on dense features
No
Is the aggregationsupervised? L1-ag-h: lasso, aggregated by height
L1-ag-d: lasso, aggregated by cluster densityNo
our methodYes
Yes
Figure 4: A comparison between our method and four other methods
By design, there are k interior nodes in T corresponding to these k groups, which we indexby B∗. We form A corresponding to this tree and generate β∗ = AB∗β
∗. We zero out k · selements of β∗ ∈ Rk and draw the magnitudes of the remaining elements independently froma Uniform(1.5, 2.5) distribution. We alternate signs of the nonzero coefficients of β∗. Thedesign matrix X ∈ Rn×p is simulated from a Poisson(0.02) distribution, and the responsey ∈ Rn is simulated from (1) with σ = ‖Xβ∗‖2/
√5n. For every method under consideration,
we average its performance over 100 repetitions in all the following simulations.We consider both low-dimensional (n = 500, p = 100, s = 0) and high-dimensional
(n = 100, p = 200, s ∈ 0.2, 0.6) scenarios, in each case taking a sequence of k values up top/2. We apply our method with T and vary the tuning parameters (α, λ) along an 8-by-50grid of values. We take all weights equal to 1 except that of the root node, which we take aszero (leaving it unpenalized). In the low-dimensional case, we compare our method to oracleleast squares, in which we perform least squares on XAB∗ . Oracle least squares representsthe best possible performance of any method that attempts to aggregate features. We alsoinclude least squares on the original design matrix X. In the high-dimensional case, wecompare our method to the oracle lasso, in which the true aggregation XAB∗ (but not thesparsity in β∗) is known, and to the lasso and ridge regression, which are each computedacross a grid of 50 values of the tuning parameter.
In addition to the above methods, we compare our method to three other approaches,meant to represent variations of how the lasso is typically applied when rare features arepresent (see Figure 4 for a schematic). The first approach, which we refer to as L1-dense,applies the lasso after first discarding any features that are in fewer than 1% of observations.The second and third approaches apply the lasso with features aggregated according to Tin an unsupervised manner. The second approach, L1-ag-h, aggregates features that are inthe same cluster after cutting the tree at a certain height. In addition to the lasso tuningparameter, the height at which we cut the tree is a second tuning parameter (chosen alongan equally-spaced grid of eight values). The third approach, L1-ag-d, performs merges ina bottom-up fashion along the tree until all aggregated features have density above somethreshold. This threshold is an additional tuning parameter (chosen along an equally spacedgrid of eight values between 0.01 and 1). The lasso tuning parameter in these methods isalways chosen along a grid of 50 values.
We measure the best mean-squared estimation error, i.e., minΛ ‖β(Λ) − β∗‖22/p, where
“best” is with respect to each method’s tuning parameter(s) Λ. The top two panels ofFigure 5 show the performance of the methods in the low-dimensional and high-dimensionalscenarios, respectively. Given that our method includes least squares and the lasso as specialcases, it is no surprise that our methods have better attainable performance than those
13
![Page 14: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/14.jpg)
10 20 30 40 50
0.00
0.05
0.10
0.15
0.20
0.25
(n, p, s) = (500, 100, 0)
k
Bes
t mea
n−sq
uare
d es
timat
ion
erro
r
least squaresL1−denseL1−ag−hL1−ag−dour methodoracle least squares
20 40 60 80 1000.
00.
51.
01.
52.
02.
53.
0
(n, p, s) = (100, 200, 0.2)
k
Bes
t mea
n−sq
uare
d es
timat
ion
erro
r
lassoridgeL1−denseL1−ag−hL1−ag−dour methodoracle lasso
0.1 0.2 0.3 0.4
0.0
0.5
1.0
1.5
2.0
2.5
3.0
(n, p, s, k) = (100, 200, 0.2, 40)
τ
Bes
t mea
n−sq
uare
d es
timat
ion
erro
r
lassoridgeL1−denseL1−ag−hL1−ag−dour methodoracle lasso
20 40 60 80 100
0.0
0.5
1.0
1.5
2.0
2.5
3.0
(n, p, s) = (100, 200, 0.6)
k
Bes
t mea
n−sq
uare
d es
timat
ion
erro
r
lassoridgeL1−denseL1−ag−hL1−ag−dour methodoracle lasso
Figure 5: Estimation error of all methods under (Top Left) (n, p, s) = (500, 100, 0) ver-sus varying k ∈ 10, 20, 30, 40, 50; (Top Right) (n, p, s) = (100, 200, 0.2) versus vary-ing k ∈ 20, 40, 60, 80, 100; (Bottom Left) (n, p, s, k) = (100, 200, 0.2, 40) versus varyingτ ∈ 0.05, 0.15, 0.25, 0.35, 0.45; (Bottom Right) (n, p, s) = (100, 200, 0.6) versus varyingk ∈ 20, 40, 60, 80, 100.
14
![Page 15: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/15.jpg)
0.5 0.6 0.7 0.8 0.9 1.0
4060
8010
012
0
(n, p, s, k) = (100, 200, 0.2, 40)
Mean Rand Index (Group Recovery)
Mea
n H
amm
ing
Dis
tanc
e (S
uppo
rt R
ecov
ery)
lassoridgeL1−denseL1−ag−hL1−ag−dour methodoracle lasso
0.5 0.6 0.7 0.8 0.9 1.0
4060
8010
012
0
(n, p, s, k) = (100, 200, 0.6, 40)
Mean Rand Index (Group Recovery)
Mea
n H
amm
ing
Dis
tanc
e (S
uppo
rt R
ecov
ery)
lassoridgeL1−denseL1−ag−hL1−ag−dour methodoracle lasso
Figure 6: Mean Hamming distance versus mean Rand index for various methods’best estimates under (Left) (n, p, s, k) = (100, 200, 0.2, 40) and (Right) (n, p, s, k) =(100, 200, 0.6, 40). For our method, the eight circles correspond to eight different α values,varying from 0 to 1. Decreasing circle size corresponds to increasing α value.
methods. These results indicate that our method performs nearly as well as the oracle whenthe true number of aggregated features, k, is small and degrades (along with the oracle) asthis quantity increases. The two other methods that use the tree, L1-ag-h and L1-ag-d,do less well than our method, but still do better than L1-dense, which simply discards rarefeatures. In the (n, p, s) = (100, 200, 0.2) case, L1-dense performs almost identically to thelasso, while L1-ag-h and L1-ag-d degrade to the lasso as k increases. By comparing theright two panels of Figure 5, we notice our method outperforms the ridge at large k when sincreases from 0.2 to 0.6, which can be explained by the increased sparsity in β∗.
We also evaluate model performance with respect to group recovery and support recovery.Recall that our method is computed over eight α values, between 0 and 1, and fifty λ values.At each α, we find the minimizer β(λ) for ‖β(λ)−β∗‖2
2 over all λ values. We measure grouprecovery by computing the Rand index (comparing the grouping of β to that of β∗), andmeasure support recovery by computing the Hamming distance (between the supports of βto that of β∗). The closer the Rand index is to one, the better our method recovers thecorrect groups. The smaller the Hamming distance is, the better our method recovers thecorrect support. For the high-dimensional scenario with k = 40, we plot eight pairs (one foreach α value) of Rand index and Hamming distance values for s = 0.2 and s = 0.6 in Figure6. We also compute the two metrics for the lasso, ridge, L1-dense, L1-ag-h, L1-ag-d, andoracle lasso. In the left panel of Figure 6, which corresponds to the low-sparsity case inβ∗, our method achieves its best performance at the largest α value. As the sparsity levelincreases, we see from the right panel of Figure 6 that the best α shifts towards zero. Inboth cases, our method outperforms the lasso, ridge, L1-dense, L1-ag-h, and L1-ag-d.
Clearly, the performance of our method, L1-ag-h and, L1-ag-d will depend on the qualityof the tree being used. In the previous simulations we provided our method with a tree thatis perfectly compatible with the true aggregating set. In practice, the tree used may be onlyan approximate representation of how features should be aggregated. We therefore study the
15
![Page 16: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/16.jpg)
sensitivity of our method to misspecification of the tree. We return to the high-dimensionalsetting above with k = 40, and we generate a sequence of trees that are increasingly distortedrepresentations of how the data should in fact be aggregated.
We begin with a true aggregation of the features into k groups as described before. In eachrepetition of the simulation, we generate a (random) tree T by performing hierarchical clus-tering on p random variables generated similarly as before except having increasing τ value,as a way to control the degradation level of the tree. When τ is small, the latent variableswill be well-separated by group so that the tree will have an aggregating set that matches thetrue aggregation structure (with high probability). As τ ∈ 0.05, 0.15, 0.25, 0.35, 0.45 in-creases, the between-group variability becomes relatively smaller compared to within-groupvariability, and thus the information provided by the tree becomes increasingly weak. Thebottom left panel of Figure 5 shows the degradation of our method as τ increases when(n, p, s, k) = (100, 200, 0.2, 40). Our method, L1-ag-h, and L1-ag-d all suffer from a poor-quality tree; the latter two degrade more quickly than ours.
6 Application to Hotel Reviews
Wang et al. (2010) crawled TripAdvisor.com to form a dataset2 of 235,793 reviews and ratingsof 1,850 hotels by users between February 14, 2009 and March 15, 2009. While there areseveral kinds of ratings, we focus on a user’s overall rating of the hotel (on a 1 to 5 scale),which we take as our response. We form a document-term matrix X in which Xij is thenumber of times the ith review uses the jth adjective.
We begin by converting words to lower case and keeping only adjectives (as determined byWordNet Fellbaum 1998; Wallace 2007; Feinerer and Hornik 2016). After removing reviewswith missing ratings, we are left with 209,987 reviews and 7,787 distinct adjectives. Theleft panel of Figure 7 shows the distribution of ratings in the data: nearly three quarters ofall ratings are above 3 stars. The extremely right-skewed distribution in the right panel ofFigure 7 shows that all but a small number of adjectives are highly rare (e.g., over 90% ofadjectives are used in fewer than 0.5% of reviews).
Rather than discard this large number of rare adjectives, our method aims to make pro-ductive use of these by leveraging side information about the relationship between adjectives.We construct a tree capturing adjective similarity as follows. We start with word embed-dings3 in a 100-dimensional space that were pre-trained by GloVe (Pennington et al., 2014)on the Gigaword5 and Wikipedia2014 corpora. We also obtain a list of adjectives, which theNRC Emotion Lexicon labels as having either positive or negative sentiments (Mohammadand Turney, 2013). We use five nearest neighbors classification within the 100-dimensionalspace of word embeddings to assign labels to the 5,795 adjectives that have not been labeledin the NRC Emotion Lexicon. This sentiment separation determines the two main branchesof the tree T . Within each branch, we perform hierarchical clustering of the word embeddingvectors. Figure 8 depicts such a tree with 2,397 adjectives (as leaves).
We compare our method (with weights all equal to 1 except for the root node, whichis left unpenalized) to four other approaches described in Figure 4. For L1-dense, we first
2Data source: http://times.cs.uiuc.edu/~wang296/Data/3Data source: http://nlp.stanford.edu/data/glove.6B.zip
16
![Page 17: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/17.jpg)
Rating Proportion1 0.0662 0.0853 0.1054 0.3085 0.436
% of Reviews Using Adjective
Num
ber o
f Adj
ectiv
es
0.0 0.2 0.4 0.6 0.8 1.00
1000
2000
3000
4000
Figure 7: (Left) distribution of TripAdvisor ratings. (Right) only 414 adjectives appear inmore than 1% of reviews; the histogram gives the distribution of usage-percentages for thoseadjectives appearing in fewer than 1% of reviews.
02
46
810
1214
expectable
pickled
corned
stocked
stuffed
packaged rum
alaskan
salmon
lengthwise
coarse
uncooked
avocado
shredded
chopped
peeled
rotten fat
bottom
tops
topped
topping
frozen
dried
steaming
whipping
leftover
stale
boiled
sour
hungry
starving
exhausted
dehydrated
caring
tending
distressed
stricken
blind
deaf
disabled
homeless
elderly
aged
eighteen fifty
ninety
contracted
treated ill
pregnant
sick
dying
conjoined
dominican
returning
departed
stranded
rescued
mod
ultra
eponym
ous
hip
punk
secret
hidden
dirty fake
animal pet
wild
mad
colored
orange red
highland
crescent
midwestern
rural
unpopulated
undeveloped
untouched
inaccessible
wooded hilly
sprawling
populated
scrub
shallow
rocky
sandy
nautical
loco
sportive
tropical
torrential
cloudy
damp
wet
humid
rainy
windy
chilly
weather
cold
warm cool
hourly
annual
yearly
gross
total
minus
projected
surpassing
whopping
average
adjusted il xx
prepaid
mailed
postal
misrepresented
inflated
skewed
unfounded
misleading
false
dubious
questionable
incomplete
inconsistent
incorrect
sketchy
obscure
aforem
entioned
peculiar
seem
ing
confusing
ambiguous
problematic
unsolicited
unauthorised
forfeit
outright
heard
loud
yelled
shouted
scream
ing
crying
blaring
banging
attic
upstairs
downstairs
awake
asleep
waking
awakened
sleeping
breathing
otc ci
wee
wan
tan
bum
capitalist
orientated
lumbering
imperialistic
ethnic
foreign
domestic thai
indonesian
korean
vietnamese
chinese
masculine
gay
abusive
abused
harassed
communist
political
liberal
penal
mandatory
strict
forbidden
prohibited
violated
restricted
limiting
slight
downward
dim
faded
dashed
disappointing
dism
allackluster
subdued
upbeat
tepid
stock
dipped
rose fell
dropping
falling
risen
fallen
lowered
slashed
sterling
recovering
recovered
lost
beat
beaten
ranked
fourth
consecutive
missed
shot
tied
lone
minute
foul
frank
sometime
mid
late
ago
fired
dism
issed
sent
ordered
sufficient
lacking
minimal
excessive
excess
waste
raw
scarce
worst
blam
eblam
edstruggling
experienced
faced
hurt
bad
worse
poor
weak
affected
impacted
negative
favorable
dangerous
isolated
unstable
elevated
relative
low
reduced
bush
federal
internal
due
previous
following
subsequent
hearing
incident
superior
potent
formidable
probable
potential
apparent
serious
remote
neighboring
camp
near
nearby
scrambled
rushed
rush
waiting
stopped
collect
tried
trying
decided
wanted
attempted
forced
unable
remaining left
removed
abandoned
spread
found
covered
exposed
offensive
defensive
uniformed
loyal
armed
fighting
pro
anti
opposed
backed
explosive
unidentified
searching
spotted
automatic
warning
flash
killing
stabbing
executed
infamous
notorious
charged
guilty
suspect
criminal
alleged
petty
drunken
drunk
operative
cutaneous
facial
bone
minor
sustained
suffering
severe
bodily
mental
fossil
warming
magnetic
hydraulic
ambient
fluid
liquid
synthetic
additive
content
analog
runny
itchy
scratchy
edible
inedible
fragrant
potted
rust
rusty
powdery
mushy
greasy
flaky
sticky
watery
hairy
hollow
fuzzy
eyed buff
reddish
bluish
darkening
opaque
murky
okay
alright
dear
damn
remiss
napping
seasick
picky
brag
bragging
inquiring
discerning
tripping
lingual
biolum
inescent
equidistant
interlaced
schematic
derisive
audible
buzzing
whispering
upwind
buggy
huffy
toed
hooked
wired
plugged
fab
gabby
littler
bedded
looseleaf
tropic
pied
trojan
iberian
smoking
secondhand
addictive
alcoholic
gastrointestinal
allergic
asthmatic
sensitive
touchy
negotiable
obligatory
overdue
undecided
shy
inclined
worried
concerned
confused
unhappy
frustrated
hesitant
reluctant
unwilling
eager
anxious
skeptical
wary
advertised
billed
booked
unavailable
rested
disposed
compensated
graded
categorized
checked
sorted
farthest
distant
approaching
wandering
sneaking
touching
bouncing
crossed
halfway
stuck
sideways
dragging
sliding
driven
rolling
rolled
floating
drifting
tipped
stray
discrim
inatory
discrim
inating
intrusive
explicit
unsafe
impractical
unusable
poorly
ineffective
inadequate
consum
ing
costly
repetitive
needless
unnecessary
discouraged
apprehensive
leery
mum
adam
ant
disturbed
horrified
disgusted
appalled
dism
ayed
hysterical
terrified
distraught
stung
embarrassed
annoyed
irritated
bothered
complaining
thunderstruck
gobsmacked
flabbergasted
speechless
apologetic
incredulous
bewildered
baffled
perplexed
puzzled
unimpressed
disheartened
perturbed
offended
disliked
dazzled
enam
ored
pestered
patronized
lazy
kindly
wise
jolly sly
insane
incompetent
unresponsive
selfish
arrogant
naive
ignorant
uncaring
aloof
forgiving
squeam
ish
fussy
indulgent
obnoxious
overbearing
cocky
pushy
uptight
surly
sullen
clueless
grum
pylame
rude
downright
nasty
ugly
stupid
silly
weird
crazy
inexperienced
disorganized
disorganised
unimaginative
unmotivated
indifferent
uninterested
noisy
rowdy
unruly
disgruntled
irate
meagre
sizeable
exorbitant
extortionate
overpriced
tempting
enticing
limitless
lofty
manageable
unmanageable
bearable
intolerable
unbearable
aggravating
compounded
stressful
pleasurable
draining
taxing
spoilt
spoiled
dreary
miserable
lousy
mediocre
feeble
pathetic
pitiful
hurried
fumbling
clum
syhaphazard
perfunctory
halfhearted
wimpy
crappy
touristy
grouchy
crotchety
snooty
snotty
smallish
grungy
scruffy
constipated
thieving
grubby
dodgy
tatty
tipsy
skanky
slothful
bothersome
nauseating
yucky
lumpy
untidy
insubstantial
unappealing
unattractive
raunchy
trashy
tasteless
sham
eless
ostentatious
obtrusive
impersonal
unsupportive
unromantic
uninviting
repulsive
revolting
stereotyped
soulless
glorified
decadent
unsavory
harmless
insignificant
imaginable
serviceable
characterless
flavorless
existent
unmentionable
borderline
atrocious
deplorable
appalling
disgusting
sham
eful
disgraceful
inexcusable
repugnant
undesirable
disadvantageous
unhelpful
unfriendly
unprofessional
dishonest
inconsiderate
impolite
disrespectful
mistaken
ignored
contrary
justified
beforehand
realised
understandable
unavoidable
unfortunate
solved
corrected
rectified
flush
crack
cracked
interfering
obliged
obliging
reverting
bowing
system
atic
cleansing
irregular
transient
endless
countless
occasional
sporadic
evil
invisible
mysterious
sheer
utter
unbelievable
nonsense
outrageous
ludicrous
absurd
ridiculous
frightening
scary
bizarre odd
strange
eerie
chilling
ominous
futuristic
unreal
worrying
depressing
alarming
disturbing
troubling
horrid
hideous
ghastly
dreadful
horrendous
horrific
horrifying
terrible
horrible
awful
inconvenient
useless
pointless
boring
tedious
dull
bland
unpleasant
uncomfortable
awkward
unnerving
disconcerting
annoying
irritating
abrupt
unexpected
sudden
immediate
swift
stark
blunt
sharp
extreme
harsh
draconian
drastic
humiliating
embarrassing
shocking
hostile
unacceptable
inappropriate
disappointed
surprised
shocked
stunned
angered
angry
furious
answering
repeated
echoing
nervous
tired
bored
weary
impatient
sad
sorry
ashamed
upsetting
upset
tricky
challenging
difficult
tough
figured
wondering
noticed
seeing
afraid
scared
wanting
pretend
desperate
vain
unlucky
tiring
frustrating
packed
crowded
jammed
filled
empty
surrounded
lined
folding
wooden
draped
bearing
pictured
framed
stacked
strung
barefoot
legged
bare
naked
staring
seated
sitting
dressed
wearing
worn
dress
beige
skimpy
wrinkled
soiled
rear
overhead
upper
outer
blank
flip
diagonal
squared
narrowing
gaping
umbrella
mat
tiered
roofed
musty
moldy
smelling
putrid
fishy
hokey
pretentious
tacky
cheesy
spooky
sneaky
groovy
smoky
bouncy
mellow
melodious
boggy
waterlogged
soggy
faultless
passable
moonlit
drizzly
comfy
cushy
stuffy
cram
ped
cluttered
swanky
seedy
dank
claustrophobic
filthy
smelly
threadbare
shabby
dingy
grimy
spartan
drab
ramshackle
decrepit
uphill
gruelling
tortuous
sleepless
hectic
frantic
overpowering
suffocating
unrelenting
blazing
leisurely
brisk
uneventful
eventful
unimpressive iffy
unsatisfying
unannounced
attendant
nonstop
understaffed
lax
outdated
cumbersom
einefficient
inoperable
operable
unattended
unlocked
ventilated
moveable
movable
motorized
stationary
unclean
sterile
sterilized
watered
scrubbed
patched
hydrated
tonic
soundproof
hydroponic
crusted
proofed
treed
deadened
slopped
thudding
vertiginous
extraordinaire
manque
slippy
piscine
barefooted
offish
laxative
swish
redux
spanking
maniac
repellent
repellant
especial
copious
frayed
soured
marred
chaotic
tense
heated split
dominated
destroyed
damaged
ruined
wrecked
shut
shuttered
interrupted
ended
broke
swept
hammered
struck
suspended
cancelled
delayed
threatened
continued
renewed
close
closed
blocked
sealed
controlled
regulated
reserved
vested
temporary
vacant
communal
unoccupied
occupied
raining
splashed
soaked
washed
littered
scattered
burning
burnt
burned
apart
broken
blown
thrown
parked
smashed
smashing
preserved
neglected
overlooked
forgotten
sunken
mangled
grave
dead
buried
torn
wrapped
wound
shady
crooked
paved
winding
bumpy dirt
dusty
muddy
rough
slippery icy
faulty
overloaded
leaky
spotty
shoddy
substandard
uneven
tight
slack
steep
flat
narrow
dense
thick
large
heavy
underground
pedestrian
raging
roaring
rumbling
congested
clogged
flooded
overflowing
prize
earned
received
million
worth
valued
stream
ing
electronic
digital
instant
online
interactive
broadband
wireless
mobile
portable
automated
managerial
technical
financial
corporate
profitable
competitive
overseas
commercial
direct
standby
beneficiary
token
supplementary
supplemental
generic
prescription
unlim
ited
minimum
maximum
prem
ium
rental
affordable
expensive
inexpensive
cheap
convenient
accessible
suitable
ideal
attractive
desirable
regent
meridian
intercontinental
mediterranean
continental
pacific
atlantic jet
plane
landed
flying fly
intrepid
chartered
guided
tracked
sighted
express
outbound
natural
mineral
oriental
occidental
electrical
mechanical
motor
electric
discovered
detected
artificial
surface
equipped
fitted
mounted
powered
built
designed
model
integrated
developed
operational
operating
dual
fixed
existing
conventional
advance
advanced
increased
expanded
improved
enhanced
maintained
limited
extended
insured
prudential
neutral
polar
net
posted
level
high
higher
raised
expected
anticipated
lowest
overall
combined
revised
processed
imported
available
distributed
contained
material
documented
collected
radio
nightly
daily
weekly
medium tiny
small
larger
smaller
size
sized
visible
characteristic
distinctive
unique
distinct
obvious
surprising
noticeable
subtle
striking
dram
atic
lesser
myriad
assorted
ranging
varied
varying
tremendous
considerable
greater
substantial
significant
massive
vast
huge
enormous
wide
deep
inner
rich
diverse
missing
alive
cleared
headed
loaded
driving
walking
catching
running
short
long
right
free
middle
home
half
past
turned
away
back cut
cutting
easy
quick
slow fast
pass
passing
forward
ahead
behind
pressed
pressing
meet
prepared
ready
must
willing
noted
described
acknowledged
stated
pointed
stressed
learned
understood
realized
telling
advised
aware
informed
proved
determined
neither
convinced
clear
impossible
otherwise
deciding
likely
unlikely
possible
future
chance
enough
able still
kept
alone
done
sure
going go
coming
gone real big
bigger
pretty
little
kind like
hard
excellent
quality
best
good
better
fair
fine
protected
protecting
clean
safe
secure
critical
key
important
vital
crucial
main
major
leading
active
involved
responsible
becoming
transformed
assumed
assuming
present
current
mere
zero
comparable
equivalent
changed
changing
similar
unusual
subject
latter
unlike
normal
typical
usual
every
entire
whole
supposed
intended
required
needed
necessary
used
applied
connected
separate
multiple
individual
related
common
different
specific
particular
certain
medical
patient
elem
entary
textbook
bilingual
experim
ental
modern
contem
porary
alternative
choice
preferred
basic
standard
traditional
custom
resident
hired
retired
trained
professional
amateur
eligible
qualified
prospective
selected
select
assigned
rank top
ranking
junior
elite
recognised
respected
foremost
outstanding
exem
plary
creative
artistic
worthy
ultim
ate
extraordinary
exceptional
enduring
genuine
famous
legendary
great
greatest
honored
dedicated
absolute
universal
ordinary
everyday
pure
satisfying
simple
straightforward
self
conscious
personal
physical
hispanic
american
native
latin
asian
african
side
cross
opposite
outside
inside
front
standing
prior
initial
last
earlier
early
later
completed set
full
complete
historic
marked
celebrated
converted
settled
relocated
incorporated
established
live
living
new
next
beginning
starting
moved
moving
working
spent
less far
even
much
well
made
just
one
another
travelled
traveled
attended
gathered
fewer
least
none
eight
nine
seven
five six
two
four
three
twenty
thirty
ten
twelve
fifteen
eleven
dozen
hundred
divided
together
arranged
jewish
affiliated
formed
joined
nationw
ide
coordinated
organised
organized
said
according
official
confirm
edreported
national
european
international
union
united
appointed
elected
representative
represented
senior
associate
assistant
executive
general
scheduled
planned
august
opened
held
sign
signed
approved
agreed
announced
joint
initiative
presidential
prem
ier
prime
indian
canadian
british
australian
english
irish
royal
colonial
victorian
newborn
teenage
teenaged
adult
female
male
young
older
younger
old
born
married
mexican
cuban
italian
brazilian
spanish
portuguese
japanese
greek
turkish
giant
owned
sold
based
firm
boss ex
former
swiss
french
belgian
dutch
danish
german
polish ii
roman
ottoman
prussian
metropolitan
industrial
residential
urban
capital
northeast
southeast
north
south
west
east
central
southern
northern
western
eastern
terminal
port
bay
base
busy
bustling
downtown
uptown
plain
bordered
overlooking
surrounding
adjacent
square
located
situated
peaceful
mass
expressed
voiced
forthcoming
upcoming
recent
latest
ongoing
continuing
civic
oriented
minded
friendly
cooperative
understanding
shared
sharing
diplom
atic
strategic
historical
cultural
many
various
several
numerous
popular
known
considered
local
private
public
independent
legal
center
complex
human
social
environm
ental
loose
thin
soft
smooth
transparent
relaxing
relaxed
accommodating
conditioned
festive
inviting
welcom
ing
welcom
ecustom
ary
routine
casual
intim
ate
correct
wrong
acceptable
reasonable
sensible
proper
appropriate
practical
objective
whatsoever
true
meaning
regardless
mean
whatever
knowing
proof
definite
geared
motivated
engaged
engaging
accustom
edalike
looking
thinking
interested
focused
reliable
accurate
careful
thorough
timely
prom
ptspeedy
intuitive
agile
econom
ical
suited
utilized apt
preferable
knowledgeable
reputable
responsive
honest
competent
growing
grown
fresh
mixed
mild
moderate
positive
strong
consistent
solid
cautious
modest
steady
overnight
quiet
calm
signal
alert
elaborate
detailed
extensive brief
lengthy
formal
informal
frequent
constant
plus
extra
additional
regular
special
accompanied
accompanying
placed
attached
summary
suprem
eunanimous
handled
addressed
resolved
fulfilled
relieved
assured
confident
pleased
satisfied
empowered
supervised
assisted
prom
ising
appealing
demanding
accepted
accepting
reverse
reversed
drawn
handed
taking
taken
given
giving
exclusive sole
granted
requested
authorized
permanent
partial
dated
listed
unknown
exact
actual
verified
confirm
ing
disclosed
topnotch
moderne
metal
iron
concrete
stone
votive
illuminated lit
lighted
carved
sculpted
painted
decorated
adorned
handmade
antique
stained
pastel
arch
arched
tiled
glazed
paneled
ornate
domed
immaculate
cathedral
outward
inland
farther
rounded
sheltered
twin tall
laid lay
nestled
tucked
shaped
etched
sacred
mythical
heavenly
divine
marian
elder
noble
hollywood star
swank
ruby
shining
stellar
glowing
faint
bygone
bust
gargantuan
gigantic
precious
priceless
rare
exotic
rustic
quaint
picturesque
secluded
idyllic
posh
upscale
upmarket
parisian
chic
fashionable
trendy
plush
luxe
palatial
luxurious
opulent
sumptuous
scenic
panoramic
sunset
sunrise
pleasant
breezy
sunny
gothic
baroque
architectural
decorative
interior
exterior
downhill
quadruple
double
triple
swimming
indoor
outdoor
world
olym
pic
golden
gold
silver
won
winning
round
final
finished
straight
open
grand
instrumental
solo
piano
bass
sound
familiar
sounding
singing
pop
master
magic
magical
twentieth
inspired
reminiscent
mirrored
reflected
evident
romantic
romance
adapted
novel
directed
directing
entitled
written
original
included
featured
straw
oval
identical
matching
matched
uniform fit
fitting
color
light
dark
bright
black
white
brown
gray
grey
yellow
pink
blue
green
curt
pitched
played
game
single
first
second
third
eleventh
ninth
fifth
sixth
seventh
capped
timed
opening
closing
stretch
midway
amicable
cordial
utmost
sincere
loving
righteous
virtuous
conscientious
compassionate
hearty
lukewarm
gracious
hospitable
respectful
courteous
polite
diligent
attentive
solicitous
considerate
sanitary
blanket
protective
botanical
landscaped
leafy
shaded
aerobic
drinkable
usable
plentiful
abundant
adequate
ample
satisfactory
workable
painless
tolerable
worthwhile
doable
inclusive
conducive
complem
entary
beneficial
flexible
effective
efficient cc
stereo
deluxe
compact cd
adjustable
trim
optional
groomed wed
educated
wealthy
healthy
healthier
vibrant
cosm
opolitan
tolerant
civilized
windward
virgin
coral
canary
tuscan
venetian
neapolitan
florentine
jamaican
hawaiian
creole
nigh
galore
presto
spellbound
pricy
homely
scrumptious
appetizing
alfresco
pampering
uncrowded
uncluttered
undisturbed
sparse
restful
sedate
unobtrusive
discreet
serene
tranquil
joyful
cheerful
cheery
encyclopedic
boundless
unparalleled
unsurpassed
unmatched
unbeatable
flawless
understated
effortless
impeccable
superlative
fluent
speaking
spoken
capable
powerful
aggressive
counter
smart
clever
sophisticated
innovative
slim
tame
marginal
meager
hefty
sizable
valuable
invaluable
handy
helpful
useful
spare
wasted
saved
saving
generous
paid
paying
brave
bold
fabulous
fantastic
wonderful
terrific
phenom
enal
astonishing
amazing
incredible
awesom
emarvelous
marvellous
glorious
majestic
magnificent
splendid
spectacular
stunning
breathtaking
dazzling
superb
brilliant
impressive
remarkable
deserved
decent
respectable
happy ok
comfortable
nice
perfect
seasoned
accomplished
talented
appreciative
enthusiastic
impressed
appreciated
fancied
entertained
adored
keen avid
curious
excited
amazed
delighted
thrilled
loved
liked
fortunate
lucky
grateful
glad
thankful hot
dry
salt
baking
cereal
crackers
peanut
fluffy
vanilla
caramel
blended
powdered
nuts
toasted
steamed
roast
chicken
fried
baked
grilled
cooked
roasted
olive mint
cherry
honey
ripe
juicy
tender
sweet
salty
savory
tasty
delicious
darling pat
ace
favourite
favorite
outback
super
mini
classic
homem
ade
takeout
complimentary
personalized
inflatable
plastic
rubber
disposable
cushioned
waterproof
removable
adagio
yummy
velvet
indigo
frosted
quilted
topless
nude
laughing
smiling
petite
polished
exquisite
elegant
graceful
crisp
sparkling
bubbly
lush
pristine
colorful
beautiful
gorgeous
lovely
tanned
refreshed
spotless
squeaky
room
ysnug
cozy
cosy
glossy
stylish
sleek
flashy
glam
orous
pricey
fancy
skinny
oversized
shiny
sparkly
finer
flowery
legible
airy
expansive
enclosed
furnished
spacious
interconnected
augm
ented
configured
elliptical
circular
rectangular
discrete
functional
corresponding
defined
preset
desired
optim
alapproximate
inferior
feminine
funny
humorous
ironic
amusing
hilarious
thoughtful
passionate
energetic
lively
spirited
mem
orable
unforgettable
exciting
fascinating
interesting
intriguing
informative
entertaining
enjoyable
authentic
timeless
fashioned
quintessential
funky
retro
eclectic
minimalist
pleasing
inspiring
alluring
enchanting
mesmerizing
charming
delightful
quirky
endearing
earthy
soothing
invigorating
refreshing
wholesome
homey
tasteful
classy
creepy
cute
adorable
chatty
personable
approachable
mannered
talkative
gritty
neat
tidy
humble
unassuming
unpretentious
Figure 8: Tree T over 2,397 adjectives: the left subtree is for adjectives with negativesentiment and the right subtree is for adjectives with positive sentiment.
17
![Page 18: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/18.jpg)
discard any adjectives that are in fewer than 0.5% of reviews before applying the lasso. BothL1-ag-h and L1-ag-d have an additional tuning parameter to take care of: for L1-ag-h
we vary the height at which we cut the tree along an equally-spaced grid of ten values; forL1-ag-d we choose the threshold for aggregations’ density along an equally spaced grid often values between 0.001 and 0.1.
Mean Squared Prediction Errorprop. n p n/p our method L1 L1-dense L1-ag-h L1-ag-d
1% 1,700 2,397 0.71 0.870 0.894 0.895 0.882 0.9715% 8,499 3,962 2.15 0.783 0.790 0.805 0.785 0.899
10% 16,999 4,786 3.55 0.758 0.764 0.788 0.764 0.90220% 33,997 5,621 6.05 0.742 0.749 0.773 0.747 1.17340% 67,995 6,472 10.51 0.739 0.740 0.768 0.742 1.10860% 101,992 6,962 14.65 0.733 0.736 0.769 0.734 1.15580% 135,990 7,294 18.64 0.733 0.733 0.765 0.734 0.886
100% 169,987 7,573 22.45 0.729 0.731 0.765 0.731 0.956
Table 1: Performance of five methods on the held-out test set: L1 is the lasso; L1-denseis the lasso on only dense features; L1-ag-h is the lasso with features aggregated based onheight; and L1-ag-d is the lasso with features aggregated based on density level.
We hold out 40,000 ratings and reviews as a test set. To observe the performance of thesemethods over a range of training set sizes, we consider a nested sequence of training sets,ranging from 1% to 100% of the reviews not included in the test set. For all methods, weuse five-fold cross validation to select tuning parameters and threshold all predicted ratingsto be within the interval [1, 5]. Table 1 displays the mean squared prediction error (MSPE)on the test set for each method and training set size.
As the size of the training set increases, all methods except for the lasso with aggregationbased on density (L1-ag-d) achieve lower MSPE. Among the four lasso-related methods, L1and L1-ag-h outperform the other two. As the training set size n increases, the numberof features p also increase but at a relatively slower rate. We notice that when n/p is lessthan 10.51, our method outperforms the other four lasso-related methods. As n/p increasesbeyond 10.51, i.e., in the statistically easier regimes, L1 and L1-ag-h attain performancecomparable to our method. We conduct paired t-tests between squared prediction errorsfrom our method and L1-ag-h at every (n, p) pair (i.e., every row of Table 1). Five out ofthe eight tests are significant at the 0.005 significance level (See Appendix J Table 4).
To better understand the difference between our method, the lasso, and L1-ag-h, we colorthe branches of the tree generated in the n = 1, 700 and p = 2, 397 case (i.e., proportionis 1%) according to the sign and magnitude of β for the three methods. The bottom treein Figure 9 corresponds to our method and has many nearby branches sharing the samered/blue color, indicating that the corresponding adjective counts have been merged. Bycontrast, the top tree in Figure 9, which corresponds to the lasso, shows that the solutionis sparser and does not have branches of similar color. The middle tree in Figure 9 showsthat L1-ag-h produces something between the two, merging some adjectives with strongsignals while keeping the rest as singletons. The strength and pattern of aggregations vary
18
![Page 19: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/19.jpg)
02
46
810
1214
expectable
pickled
corned
stocked
stuffed
packaged rum
alaskan
salmon
lengthwise
coarse
uncooked
avocado
shredded
chopped
peeled
rotten fat
bottom
tops
topped
topping
frozen
dried
steaming
whipping
leftover
stale
boiled
sour
hungry
starving
exhausted
dehydrated
caring
tending
distressed
stricken
blind
deaf
disabled
homeless
elderly
aged
eighteen fifty
ninety
contracted
treated ill
pregnant
sick
dying
conjoined
dominican
returning
departed
stranded
rescued
mod
ultra
eponym
ous
hip
punk
secret
hidden
dirty fake
animal pet
wild
mad
colored
orange red
highland
crescent
midwe
stern
rural
unpopulated
undeveloped
untouched
inaccessible
wooded hilly
sprawling
populated
scrub
shallow
rocky
sandy
nautical
loco
sportive
tropical
torrential
cloudy
damp
wet
humid
rainy
windy
chilly
weather
cold
warm cool
hourly
annual
yearly
gross
total
minus
projected
surpassing
whopping
average
adjusted il xx
prepaid
mailed
postal
misrepresented
inflated
skewed
unfounded
misleading
false
dubious
questionable
incomplete
inconsistent
incorrect
sketchy
obscure
aforem
entioned
peculiar
seem
ing
confusing
ambiguous
problematic
unsolicited
unauthorised
forfeit
outright
heard
loud
yelled
shouted
scream
ing
crying
blaring
banging
attic
upstairs
downstairs
awake
asleep
waking
awakened
sleeping
breathing
otc ci
wee
wan
tan
bum
capitalist
orientated
lumbering
imperialistic
ethnic
foreign
domestic thai
indonesian
korean
vietnamese
chinese
masculine
gay
abusive
abused
harassed
communist
political
liberal
penal
mandatory
strict
forbidden
prohibited
violated
restricted
limiting
slight
downward
dim
faded
dashed
disappointing
dism
allackluster
subdued
upbeat
tepid
stock
dipped
rose fell
dropping
falling
risen
fallen
lowe
red
slashed
sterling
recovering
recovered
lost
beat
beaten
ranked
fourth
consecutive
missed
shot
tied
lone
minute
foul
frank
sometime
mid
late
ago
fired
dism
issed
sent
ordered
sufficient
lacking
minimal
excessive
excess
waste
raw
scarce
worst
blam
eblam
edstruggling
experienced
faced
hurt
bad
worse
poor
weak
affected
impacted
negative
favorable
dangerous
isolated
unstable
elevated
relative
low
reduced
bush
federal
internal
due
previous
following
subsequent
hearing
incident
superior
potent
formidable
probable
potential
apparent
serious
remote
neighboring
camp
near
nearby
scrambled
rushed
rush
waiting
stopped
collect
tried
trying
decided
wanted
attempted
forced
unable
remaining left
removed
abandoned
spread
found
covered
exposed
offensive
defensive
uniformed
loyal
armed
fighting
pro
anti
opposed
backed
explosive
unidentified
searching
spotted
automatic
warning
flash
killing
stabbing
executed
infamous
notorious
charged
guilty
suspect
criminal
alleged
petty
drunken
drunk
operative
cutaneous
facial
bone
minor
sustained
suffering
severe
bodily
mental
fossil
warming
magnetic
hydraulic
ambient
fluid
liquid
synthetic
additive
content
analog
runny
itchy
scratchy
edible
inedible
fragrant
potted
rust
rusty
powdery
mushy
greasy
flaky
sticky
watery
hairy
hollow
fuzzy
eyed buff
reddish
bluish
darkening
opaque
murky
okay
alright
dear
damn
remiss
napping
seasick
picky
brag
bragging
inquiring
discerning
tripping
lingual
biolum
inescent
equidistant
interlaced
schematic
derisive
audible
buzzing
whispering
upwind
buggy
huffy
toed
hooked
wired
plugged
fab
gabby
littler
bedded
looseleaf
tropic
pied
trojan
iberian
smoking
secondhand
addictive
alcoholic
gastrointestinal
allergic
asthmatic
sensitive
touchy
negotiable
obligatory
overdue
undecided
shy
inclined
worried
concerned
confused
unhappy
frustrated
hesitant
reluctant
unwilling
eager
anxious
skeptical
wary
advertised
billed
booked
unavailable
rested
disposed
compensated
graded
categorized
checked
sorted
farthest
distant
approaching
wandering
sneaking
touching
bouncing
crossed
halfway
stuck
sideways
dragging
sliding
driven
rolling
rolled
floating
drifting
tipped
stray
discrim
inatory
discrim
inating
intrusive
explicit
unsafe
impractical
unusable
poorly
ineffective
inadequate
consum
ing
costly
repetitive
needless
unnecessary
discouraged
apprehensive
leery
mum
adam
ant
disturbed
horrified
disgusted
appalled
dism
ayed
hysterical
terrified
distraught
stung
embarrassed
annoyed
irritated
bothered
complaining
thunderstruck
gobsmacked
flabbergasted
speechless
apologetic
incredulous
bewildered
baffled
perplexed
puzzled
unimpressed
disheartened
perturbed
offended
disliked
dazzled
enam
ored
pestered
patronized
lazy
kindly
wise
jolly sly
insane
incompetent
unresponsive
selfish
arrogant
naive
ignorant
uncaring
aloof
forgiving
squeam
ish
fussy
indulgent
obnoxious
overbearing
cocky
pushy
uptight
surly
sullen
clueless
grum
pylame
rude
downright
nasty
ugly
stupid
silly
weird
crazy
inexperienced
disorganized
disorganised
unimaginative
unmotivated
indifferent
uninterested
noisy
rowdy
unruly
disgruntled
irate
meagre
sizeable
exorbitant
extortionate
overpriced
tempting
enticing
limitless
lofty
manageable
unmanageable
bearable
intolerable
unbearable
aggravating
compounded
stressful
pleasurable
draining
taxing
spoilt
spoiled
dreary
miserable
lousy
mediocre
feeble
pathetic
pitiful
hurried
fumbling
clum
syhaphazard
perfunctory
halfhearted
wimpy
crappy
touristy
grouchy
crotchety
snooty
snotty
smallish
grungy
scruffy
constipated
thieving
grubby
dodgy
tatty
tipsy
skanky
slothful
bothersome
nauseating
yucky
lumpy
untidy
insubstantial
unappealing
unattractive
raunchy
trashy
tasteless
sham
eless
ostentatious
obtrusive
impersonal
unsupportive
unromantic
uninviting
repulsive
revolting
stereotyped
soulless
glorified
decadent
unsavory
harmless
insignificant
imaginable
serviceable
characterless
flavorless
existent
unmentionable
borderline
atrocious
deplorable
appalling
disgusting
sham
eful
disgraceful
inexcusable
repugnant
undesirable
disadvantageous
unhelpful
unfriendly
unprofessional
dishonest
inconsiderate
impolite
disrespectful
mistaken
ignored
contrary
justified
beforehand
realised
understandable
unavoidable
unfortunate
solved
corrected
rectified
flush
crack
cracked
interfering
obliged
obliging
reverting
bowing
system
atic
cleansing
irregular
transient
endless
countless
occasional
sporadic
evil
invisible
mysterious
sheer
utter
unbelievable
nonsense
outrageous
ludicrous
absurd
ridiculous
frightening
scary
bizarre odd
strange
eerie
chilling
ominous
futuristic
unreal
worrying
depressing
alarming
disturbing
troubling
horrid
hideous
ghastly
dreadful
horrendous
horrific
horrifying
terrible
horrible
awful
inconvenient
useless
pointless
boring
tedious
dull
bland
unpleasant
uncomfortable
awkward
unnerving
disconcerting
annoying
irritating
abrupt
unexpected
sudden
immediate
swift
stark
blunt
sharp
extreme
harsh
draconian
drastic
humiliating
embarrassing
shocking
hostile
unacceptable
inappropriate
disappointed
surprised
shocked
stunned
angered
angry
furious
answering
repeated
echoing
nervous
tired
bored
weary
impatient
sad
sorry
ashamed
upsetting
upset
tricky
challenging
difficult
tough
figured
wondering
noticed
seeing
afraid
scared
wanting
pretend
desperate
vain
unlucky
tiring
frustrating
packed
crowded
jammed
filled
empty
surrounded
lined
folding
wooden
draped
bearing
pictured
framed
stacked
strung
barefoot
legged
bare
naked
staring
seated
sitting
dressed
wearing
worn
dress
beige
skimpy
wrinkled
soiled
rear
overhead
upper
outer
blank
flip
diagonal
squared
narrowing
gaping
umbrella
mat
tiered
roofed
musty
moldy
smelling
putrid
fishy
hokey
pretentious
tacky
cheesy
spooky
sneaky
groovy
smoky
bouncy
mellow
melodious
boggy
waterlogged
soggy
faultless
passable
moonlit
drizzly
comfy
cushy
stuffy
cram
ped
cluttered
swanky
seedy
dank
claustrophobic
filthy
smelly
threadbare
shabby
dingy
grimy
spartan
drab
ramshackle
decrepit
uphill
gruelling
tortuous
sleepless
hectic
frantic
overpowe
ring
suffocating
unrelenting
blazing
leisurely
brisk
uneventful
eventful
unimpressive iffy
unsatisfying
unannounced
attendant
nonstop
understaffed
lax
outdated
cumbersom
einefficient
inoperable
operable
unattended
unlocked
ventilated
moveable
movable
motorized
stationary
unclean
sterile
sterilized
watered
scrubbed
patched
hydrated
tonic
soundproof
hydroponic
crusted
proofed
treed
deadened
slopped
thudding
vertiginous
extraordinaire
manque
slippy
piscine
barefooted
offish
laxative
swish
redux
spanking
maniac
repellent
repellant
especial
copious
frayed
soured
marred
chaotic
tense
heated split
dominated
destroyed
damaged
ruined
wrecked
shut
shuttered
interrupted
ended
broke
swept
hammered
struck
suspended
cancelled
delayed
threatened
continued
renewe
dclose
closed
blocked
sealed
controlled
regulated
reserved
vested
temporary
vacant
communal
unoccupied
occupied
raining
splashed
soaked
washed
littered
scattered
burning
burnt
burned
apart
broken
blown
thrown
parked
smashed
smashing
preserved
neglected
overlooked
forgotten
sunken
mangled
grave
dead
buried
torn
wrapped
wound
shady
crooked
paved
winding
bumpy dirt
dusty
muddy
rough
slippery icy
faulty
overloaded
leaky
spotty
shoddy
substandard
uneven
tight
slack
steep
flat
narrow
dense
thick
large
heavy
underground
pedestrian
raging
roaring
rumbling
congested
clogged
flooded
overflowing
prize
earned
received
million
worth
valued
stream
ing
electronic
digital
instant
online
interactive
broadband
wireless
mobile
portable
automated
managerial
technical
financial
corporate
profitable
competitive
overseas
commercial
direct
standby
beneficiary
token
supplementary
supplemental
generic
prescription
unlim
ited
minimum
maximum
prem
ium
rental
affordable
expensive
inexpensive
cheap
convenient
accessible
suitable
ideal
attractive
desirable
regent
meridian
intercontinental
mediterranean
continental
pacific
atlantic jet
plane
landed
flying fly
intrepid
chartered
guided
tracked
sighted
express
outbound
natural
mineral
oriental
occidental
electrical
mechanical
motor
electric
discovered
detected
artificial
surface
equipped
fitted
mounted
powe
red
built
designed
model
integrated
developed
operational
operating
dual
fixed
existing
conventional
advance
advanced
increased
expanded
improved
enhanced
maintained
limited
extended
insured
prudential
neutral
polar
net
posted
level
high
higher
raised
expected
anticipated
lowe
stoverall
combined
revised
processed
imported
available
distributed
contained
material
documented
collected
radio
nightly
daily
weekly
medium tiny
small
larger
smaller
size
sized
visible
characteristic
distinctive
unique
distinct
obvious
surprising
noticeable
subtle
striking
dram
atic
lesser
myriad
assorted
ranging
varied
varying
tremendous
considerable
greater
substantial
significant
massive
vast
huge
enormous
wide
deep
inner
rich
diverse
missing
alive
cleared
headed
loaded
driving
walking
catching
running
short
long
right
free
middle
home
half
past
turned
away
back cut
cutting
easy
quick
slow fast
pass
passing
forward
ahead
behind
pressed
pressing
meet
prepared
ready
must
willing
noted
described
acknowledged
stated
pointed
stressed
learned
understood
realized
telling
advised
aware
informed
proved
determined
neither
convinced
clear
impossible
otherwise
deciding
likely
unlikely
possible
future
chance
enough
able still
kept
alone
done
sure
going go
coming
gone real big
bigger
pretty
little
kind like
hard
excellent
quality
best
good
better
fair
fine
protected
protecting
clean
safe
secure
critical
key
important
vital
crucial
main
major
leading
active
involved
responsible
becoming
transformed
assumed
assuming
present
current
mere
zero
comparable
equivalent
changed
changing
similar
unusual
subject
latter
unlike
normal
typical
usual
every
entire
whole
supposed
intended
required
needed
necessary
used
applied
connected
separate
multiple
individual
related
common
different
specific
particular
certain
medical
patient
elem
entary
textbook
bilingual
experim
ental
modern
contem
porary
alternative
choice
preferred
basic
standard
traditional
custom
resident
hired
retired
trained
professional
amateur
eligible
qualified
prospective
selected
select
assigned
rank top
ranking
junior
elite
recognised
respected
foremost
outstanding
exem
plary
creative
artistic
worthy
ultim
ate
extraordinary
exceptional
enduring
genuine
famous
legendary
great
greatest
honored
dedicated
absolute
universal
ordinary
everyday
pure
satisfying
simple
straightforward
self
conscious
personal
physical
hispanic
american
native
latin
asian
african
side
cross
opposite
outside
inside
front
standing
prior
initial
last
earlier
early
later
completed set
full
complete
historic
marked
celebrated
converted
settled
relocated
incorporated
established
live
living
new
next
beginning
starting
moved
moving
working
spent
less far
even
much
well
made
just
one
another
travelled
traveled
attended
gathered
fewe
rleast
none
eight
nine
seven
five six
two
four
three
twenty
thirty
ten
twelve
fifteen
eleven
dozen
hundred
divided
together
arranged
jewish
affiliated
formed
joined
nationw
ide
coordinated
organised
organized
said
according
official
confirm
edreported
national
european
international
union
united
appointed
elected
representative
represented
senior
associate
assistant
executive
general
scheduled
planned
august
opened
held
sign
signed
approved
agreed
announced
joint
initiative
presidential
prem
ier
prime
indian
canadian
british
australian
english
irish
royal
colonial
victorian
newborn
teenage
teenaged
adult
female
male
young
older
younger
old
born
married
mexican
cuban
italian
brazilian
spanish
portuguese
japanese
greek
turkish
giant
owned
sold
based
firm
boss ex
former
swiss
french
belgian
dutch
danish
german
polish ii
roman
ottoman
prussian
metropolitan
industrial
residential
urban
capital
northeast
southeast
north
south
west
east
central
southern
northern
western
eastern
terminal
port
bay
base
busy
bustling
downtown
uptown
plain
bordered
overlooking
surrounding
adjacent
square
located
situated
peaceful
mass
expressed
voiced
forthcoming
upcoming
recent
latest
ongoing
continuing
civic
oriented
minded
friendly
cooperative
understanding
shared
sharing
diplom
atic
strategic
historical
cultural
many
various
several
numerous
popular
known
considered
local
private
public
independent
legal
center
complex
human
social
environm
ental
loose
thin
soft
smooth
transparent
relaxing
relaxed
accommodating
conditioned
festive
inviting
welcom
ing
welcom
ecustom
ary
routine
casual
intim
ate
correct
wrong
acceptable
reasonable
sensible
proper
appropriate
practical
objective
whatsoever
true
meaning
regardless
mean
whatever
knowing
proof
definite
geared
motivated
engaged
engaging
accustom
edalike
looking
thinking
interested
focused
reliable
accurate
careful
thorough
timely
prom
ptspeedy
intuitive
agile
econom
ical
suited
utilized apt
preferable
knowledgeable
reputable
responsive
honest
competent
growing
grown
fresh
mixed
mild
moderate
positive
strong
consistent
solid
cautious
modest
steady
overnight
quiet
calm
signal
alert
elaborate
detailed
extensive brief
lengthy
formal
informal
frequent
constant
plus
extra
additional
regular
special
accompanied
accompanying
placed
attached
summary
suprem
eunanimous
handled
addressed
resolved
fulfilled
relieved
assured
confident
pleased
satisfied
empowe
red
supervised
assisted
prom
ising
appealing
demanding
accepted
accepting
reverse
reversed
drawn
handed
taking
taken
given
giving
exclusive sole
granted
requested
authorized
permanent
partial
dated
listed
unknown
exact
actual
verified
confirm
ing
disclosed
topnotch
moderne
metal
iron
concrete
stone
votive
illuminated lit
lighted
carved
sculpted
painted
decorated
adorned
handmade
antique
stained
pastel
arch
arched
tiled
glazed
paneled
ornate
domed
immaculate
cathedral
outward
inland
farther
rounded
sheltered
twin tall
laid lay
nestled
tucked
shaped
etched
sacred
mythical
heavenly
divine
marian
elder
noble
hollywo
od star
swank
ruby
shining
stellar
glowing
faint
bygone
bust
gargantuan
gigantic
precious
priceless
rare
exotic
rustic
quaint
picturesque
secluded
idyllic
posh
upscale
upmarket
parisian
chic
fashionable
trendy
plush
luxe
palatial
luxurious
opulent
sumptuous
scenic
panoramic
sunset
sunrise
pleasant
breezy
sunny
gothic
baroque
architectural
decorative
interior
exterior
downhill
quadruple
double
triple
swimming
indoor
outdoor
world
olym
pic
golden
gold
silver
won
winning
round
final
finished
straight
open
grand
instrumental
solo
piano
bass
sound
familiar
sounding
singing
pop
master
magic
magical
twentieth
inspired
reminiscent
mirrored
reflected
evident
romantic
romance
adapted
novel
directed
directing
entitled
written
original
included
featured
straw
oval
identical
matching
matched
uniform fit
fitting
color
light
dark
bright
black
white
brown
gray
grey
yellow
pink
blue
green
curt
pitched
played
game
single
first
second
third
eleventh
ninth
fifth
sixth
seventh
capped
timed
opening
closing
stretch
midway
amicable
cordial
utmost
sincere
loving
righteous
virtuous
conscientious
compassionate
hearty
lukewarm
gracious
hospitable
respectful
courteous
polite
diligent
attentive
solicitous
considerate
sanitary
blanket
protective
botanical
landscaped
leafy
shaded
aerobic
drinkable
usable
plentiful
abundant
adequate
ample
satisfactory
workable
painless
tolerable
worthwhile
doable
inclusive
conducive
complem
entary
beneficial
flexible
effective
efficient cc
stereo
deluxe
compact cd
adjustable
trim
optional
groomed wed
educated
wealthy
healthy
healthier
vibrant
cosm
opolitan
tolerant
civilized
windward
virgin
coral
canary
tuscan
venetian
neapolitan
florentine
jamaican
hawaiian
creole
nigh
galore
presto
spellbound
pricy
homely
scrumptious
appetizing
alfresco
pampering
uncrowded
uncluttered
undisturbed
sparse
restful
sedate
unobtrusive
discreet
serene
tranquil
joyful
cheerful
cheery
encyclopedic
boundless
unparalleled
unsurpassed
unmatched
unbeatable
flawless
understated
effortless
impeccable
superlative
fluent
speaking
spoken
capable
powe
rful
aggressive
counter
smart
clever
sophisticated
innovative
slim
tame
marginal
meager
hefty
sizable
valuable
invaluable
handy
helpful
useful
spare
wasted
saved
saving
generous
paid
paying
brave
bold
fabulous
fantastic
wonderful
terrific
phenom
enal
astonishing
amazing
incredible
awesom
emarvelous
marvellous
glorious
majestic
magnificent
splendid
spectacular
stunning
breathtaking
dazzling
superb
brilliant
impressive
remarkable
deserved
decent
respectable
happy ok
comfortable
nice
perfect
seasoned
accomplished
talented
appreciative
enthusiastic
impressed
appreciated
fancied
entertained
adored
keen avid
curious
excited
amazed
delighted
thrilled
loved
liked
fortunate
lucky
grateful
glad
thankful hot
dry
salt
baking
cereal
crackers
peanut
fluffy
vanilla
caramel
blended
powdered
nuts
toasted
steamed
roast
chicken
fried
baked
grilled
cooked
roasted
olive mint
cherry
honey
ripe
juicy
tender
sweet
salty
savory
tasty
delicious
darling pat
ace
favourite
favorite
outback
super
mini
classic
homem
ade
takeout
complimentary
personalized
inflatable
plastic
rubber
disposable
cushioned
waterproof
removable
adagio
yummy
velvet
indigo
frosted
quilted
topless
nude
laughing
smiling
petite
polished
exquisite
elegant
graceful
crisp
sparkling
bubbly
lush
pristine
colorful
beautiful
gorgeous
lovely
tanned
refreshed
spotless
squeaky
room
ysnug
cozy
cosy
glossy
stylish
sleek
flashy
glam
orous
pricey
fancy
skinny
oversized
shiny
sparkly
finer
flowe
rylegible
airy
expansive
enclosed
furnished
spacious
interconnected
augm
ented
configured
elliptical
circular
rectangular
discrete
functional
corresponding
defined
preset
desired
optim
alapproximate
inferior
feminine
funny
humorous
ironic
amusing
hilarious
thoughtful
passionate
energetic
lively
spirited
mem
orable
unforgettable
exciting
fascinating
interesting
intriguing
informative
entertaining
enjoyable
authentic
timeless
fashioned
quintessential
funky
retro
eclectic
minimalist
pleasing
inspiring
alluring
enchanting
mesmerizing
charming
delightful
quirky
endearing
earthy
soothing
invigorating
refreshing
wholesome
homey
tasteful
classy
creepy
cute
adorable
chatty
personable
approachable
mannered
talkative
gritty
neat
tidy
humble
unassuming
unpretentious
02
46
810
1214
expe
ctab
lepi
ckle
dco
rned
stoc
ked
stuf
fed
pack
aged
rum
alas
kan
salm
onle
ngth
wis
eco
arse
unco
oked
avoc
ado
shre
dded
chop
ped
peel
edro
tten fat
botto
mto
psto
pped
topp
ing
froz
endr
ied
stea
min
gw
hipp
ing
lefto
ver
stal
ebo
iled
sour
hung
ryst
arvi
ngex
haus
ted
dehy
drat
edca
ring
tend
ing
dist
ress
edst
ricke
nbl
ind
deaf
disa
bled
hom
eles
sel
derly
aged
eigh
teen fifty
nine
tyco
ntra
cted
trea
ted ill
preg
nant
sick
dyin
gco
njoi
ned
dom
inic
anre
turn
ing
depa
rted
stra
nded
resc
ued
mod
ultr
aep
onym
ous
hip
punk
secr
ethi
dden
dirt
yfa
kean
imal
pet
wild
mad
colo
red
oran
ge red
high
land
cres
cent
mid
wes
tern
rura
lun
popu
late
dun
deve
lope
dun
touc
hed
inac
cess
ible
woo
ded
hilly
spra
wlin
gpo
pula
ted
scru
bsh
allo
wro
cky
sand
yna
utic
allo
cosp
ortiv
etr
opic
alto
rren
tial
clou
dyda
mp
wet
hum
idra
iny
win
dych
illy
wea
ther
cold
war
mco
olho
urly
annu
alye
arly
gros
sto
tal
min
uspr
ojec
ted
surp
assi
ngw
hopp
ing
aver
age
adju
sted
il xxpr
epai
dm
aile
dpo
stal
mis
repr
esen
ted
infla
ted
skew
edun
foun
ded
mis
lead
ing
fals
edu
biou
squ
estio
nabl
ein
com
plet
ein
cons
iste
ntin
corr
ect
sket
chy
obsc
ure
afor
emen
tione
dpe
culia
rse
emin
gco
nfus
ing
ambi
guou
spr
oble
mat
icun
solic
ited
unau
thor
ised
forf
eit
outr
ight
hear
dlo
udye
lled
shou
ted
scre
amin
gcr
ying
blar
ing
bang
ing
attic
upst
airs
dow
nsta
irsaw
ake
asle
epw
akin
gaw
aken
edsl
eepi
ngbr
eath
ing
otc ci
wee
wan ta
nbu
mca
pita
list
orie
ntat
edlu
mbe
ring
impe
rialis
ticet
hnic
fore
ign
dom
estic
thai
indo
nesi
anko
rean
viet
nam
ese
chin
ese
mas
culin
ega
yab
usiv
eab
used
hara
ssed
com
mun
ist
polit
ical
liber
alpe
nal
man
dato
ryst
rict
forb
idde
npr
ohib
ited
viol
ated
rest
ricte
dlim
iting
slig
htdo
wnw
ard
dim
fade
dda
shed
disa
ppoi
ntin
gdi
smal
lack
lust
ersu
bdue
dup
beat
tepi
dst
ock
dipp
edro
se fell
drop
ping
falli
ngris
enfa
llen
low
ered
slas
hed
ster
ling
reco
verin
gre
cove
red
lost
beat
beat
enra
nked
four
thco
nsec
utiv
em
isse
dsh
ottie
dlo
nem
inut
efo
ulfr
ank
som
etim
em
idla
teag
ofir
eddi
smis
sed
sent
orde
red
suffi
cien
tla
ckin
gm
inim
alex
cess
ive
exce
ssw
aste
raw
scar
cew
orst
blam
ebl
amed
stru
gglin
gex
perie
nced
face
dhu
rtba
dw
orse
poor
wea
kaf
fect
edim
pact
edne
gativ
efa
vora
ble
dang
erou
sis
olat
edun
stab
leel
evat
edre
lativ
elo
wre
duce
dbu
shfe
dera
lin
tern
aldu
epr
evio
usfo
llow
ing
subs
eque
nthe
arin
gin
cide
ntsu
perio
rpo
tent
form
idab
lepr
obab
lepo
tent
ial
appa
rent
serio
usre
mot
ene
ighb
orin
gca
mp
near
near
bysc
ram
bled
rush
edru
shw
aitin
gst
oppe
dco
llect
trie
dtr
ying
deci
ded
wan
ted
atte
mpt
edfo
rced
unab
lere
mai
ning le
ftre
mov
edab
ando
ned
spre
adfo
und
cove
red
expo
sed
offe
nsiv
ede
fens
ive
unifo
rmed
loya
lar
med
fight
ing
pro
anti
oppo
sed
back
edex
plos
ive
unid
entif
ied
sear
chin
gsp
otte
dau
tom
atic
war
ning
flash
killi
ngst
abbi
ngex
ecut
edin
fam
ous
noto
rious
char
ged
guilt
ysu
spec
tcr
imin
alal
lege
dpe
ttydr
unke
ndr
unk
oper
ativ
ecu
tane
ous
faci
albo
nem
inor
sust
aine
dsu
fferin
gse
vere
bodi
lym
enta
lfo
ssil
war
min
gm
agne
tichy
drau
licam
bien
tflu
idliq
uid
synt
hetic
addi
tive
cont
ent
anal
ogru
nny
itchy
scra
tchy
edib
lein
edib
lefr
agra
ntpo
tted
rust
rust
ypo
wde
rym
ushy
grea
syfla
kyst
icky
wat
ery
hair
yho
llow
fuzz
yey
ed buff
redd
ish
blui
shda
rken
ing
opaq
uem
urky
okay
alrig
htde
arda
mn
rem
iss
napp
ing
seas
ick
pick
ybr
agbr
aggi
ngin
quiri
ngdi
scer
ning
trip
ping
lingu
albi
olum
ines
cent
equi
dist
ant
inte
rlace
dsc
hem
atic
deris
ive
audi
ble
buzz
ing
whi
sper
ing
upw
ind
bugg
yhu
ffyto
edho
oked
wire
dpl
ugge
dfa
bga
bby
little
rbe
dded
loos
elea
ftr
opic
pied
troj
anib
eria
nsm
okin
gse
cond
hand
addi
ctiv
eal
coho
licga
stro
inte
stin
alal
lerg
icas
thm
atic
sens
itive
touc
hyne
gotia
ble
oblig
ator
yov
erdu
eun
deci
ded
shy
incl
ined
wor
ried
conc
erne
dco
nfus
edun
happ
yfr
ustr
ated
hesi
tant
relu
ctan
tun
will
ing
eage
ran
xiou
ssk
eptic
alw
ary
adve
rtis
edbi
lled
book
edun
avai
labl
ere
sted
disp
osed
com
pens
ated
grad
edca
tego
rized
chec
ked
sort
edfa
rthe
stdi
stan
tap
proa
chin
gw
ande
ring
snea
king
touc
hing
boun
cing
cros
sed
halfw
ayst
uck
side
way
sdr
aggi
ngsl
idin
gdr
iven
rolli
ngro
lled
float
ing
drift
ing
tippe
dst
ray
disc
rimin
ator
ydi
scrim
inat
ing
intr
usiv
eex
plic
itun
safe
impr
actic
alun
usab
lepo
orly
inef
fect
ive
inad
equa
teco
nsum
ing
cost
lyre
petit
ive
need
less
unne
cess
ary
disc
oura
ged
appr
ehen
sive
leer
ym
umad
aman
tdi
stur
bed
horr
ified
disg
uste
dap
palle
ddi
smay
edhy
ster
ical
terr
ified
dist
raug
htst
ung
emba
rras
sed
anno
yed
irrita
ted
both
ered
com
plai
ning
thun
ders
truc
kgo
bsm
acke
dfla
bber
gast
edsp
eech
less
apol
oget
icin
cred
ulou
sbe
wild
ered
baffl
edpe
rple
xed
puzz
led
unim
pres
sed
dish
eart
ened
pert
urbe
dof
fend
eddi
slik
edda
zzle
den
amor
edpe
ster
edpa
tron
ized
lazy
kind
lyw
ise
jolly sly
insa
nein
com
pete
ntun
resp
onsi
vese
lfish
arro
gant
naiv
eig
nora
ntun
carin
gal
oof
forg
ivin
gsq
ueam
ish
fuss
yin
dulg
ent
obno
xiou
sov
erbe
arin
gco
cky
push
yup
tight
surly
sulle
ncl
uele
ssgr
umpy
lam
eru
dedo
wnr
ight
nast
yug
lyst
upid
silly
wei
rdcr
azy
inex
perie
nced
diso
rgan
ized
diso
rgan
ised
unim
agin
ativ
eun
mot
ivat
edin
diffe
rent
unin
tere
sted
nois
yro
wdy
unru
lydi
sgru
ntle
dira
tem
eagr
esi
zeab
leex
orbi
tant
exto
rtio
nate
over
pric
edte
mpt
ing
entic
ing
limitl
ess
lofty
man
agea
ble
unm
anag
eabl
ebe
arab
lein
tole
rabl
eun
bear
able
aggr
avat
ing
com
poun
ded
stre
ssfu
lpl
easu
rabl
edr
aini
ngta
spoi
ltsp
oile
ddr
eary
mis
erab
lelo
usy
med
iocr
efe
eble
path
etic
pitif
ulhu
rrie
dfu
mbl
ing
clum
syha
phaz
ard
perf
unct
ory
halfh
eart
edw
impy
crap
pyto
uris
tygr
ouch
ycr
otch
ety
snoo
tysn
otty
smal
lish
grun
gysc
ruffy
cons
tipat
edth
ievi
nggr
ubby
dodg
yta
ttytip
sysk
anky
slot
hful
both
erso
me
naus
eatin
gyu
cky
lum
pyun
tidy
insu
bsta
ntia
lun
appe
alin
gun
attr
activ
era
unch
ytr
ashy
tast
eles
ssh
amel
ess
oste
ntat
ious
obtr
usiv
eim
pers
onal
unsu
ppor
tive
unro
man
ticun
invi
ting
repu
lsiv
ere
volti
ngst
ereo
type
dso
ulle
ssgl
orifi
edde
cade
ntun
savo
ryha
rmle
ssin
sign
ifica
ntim
agin
able
serv
icea
ble
char
acte
rless
flavo
rless
exis
tent
unm
entio
nabl
ebo
rder
line
atro
ciou
sde
plor
able
appa
lling
disg
ustin
gsh
amef
uldi
sgra
cefu
lin
excu
sabl
ere
pugn
ant
unde
sira
ble
disa
dvan
tage
ous
unhe
lpfu
lun
frie
ndly
unpr
ofes
sion
aldi
shon
est
inco
nsid
erat
eim
polit
edi
sres
pect
ful
mis
take
nig
nore
dco
ntra
ryju
stifi
edbe
fore
hand
real
ised
unde
rsta
ndab
leun
avoi
dabl
eun
fort
unat
eso
lved
corr
ecte
dre
ctifi
edflu
shcr
ack
crac
ked
inte
rfer
ing
oblig
edob
ligin
gre
vert
ing
bow
ing
syst
emat
iccl
eans
ing
irreg
ular
tran
sien
ten
dles
sco
untle
ssoc
casi
onal
spor
adic
evil
invi
sibl
em
yste
rious
shee
rut
ter
unbe
lieva
ble
nons
ense
outr
ageo
uslu
dicr
ous
absu
rdrid
icul
ous
frig
hten
ing
scar
ybi
zarr
eod
dst
rang
eee
riech
illin
gom
inou
sfu
turis
ticun
real
wor
ryin
gde
pres
sing
alar
min
gdi
stur
bing
trou
blin
gho
rrid
hide
ous
ghas
tlydr
eadf
ulho
rren
dous
horr
ific
horr
ifyin
gte
rrib
leho
rrib
leaw
ful
inco
nven
ient
usel
ess
poin
tless
borin
gte
diou
sdu
llbl
and
unpl
easa
ntun
com
fort
able
awkw
ard
unne
rvin
gdi
scon
cert
ing
anno
ying
irrita
ting
abru
ptun
expe
cted
sudd
enim
med
iate
swift
star
kbl
unt
shar
pex
trem
eha
rsh
drac
onia
ndr
astic
hum
iliat
ing
emba
rras
sing
shoc
king
host
ileun
acce
ptab
lein
appr
opria
tedi
sapp
oint
edsu
rpris
edsh
ocke
dst
unne
dan
gere
dan
gry
furio
usan
swer
ing
repe
ated
echo
ing
nerv
ous
tired
bore
dw
eary
impa
tient
sad
sorr
yas
ham
edup
setti
ngup
set
tric
kych
alle
ngin
gdi
fficu
ltto
ugh
figur
edw
onde
ring
notic
edse
eing
afra
idsc
ared
wan
ting
pret
end
desp
erat
eva
inun
luck
ytir
ing
frus
trat
ing
pack
edcr
owde
dja
mm
edfil
led
empt
ysu
rrou
nded
lined
fold
ing
woo
den
drap
edbe
arin
gpi
ctur
edfr
amed
stac
ked
stru
ngba
refo
otle
gged
bare
nake
dst
arin
gse
ated
sitti
ngdr
esse
dw
earin
gw
orn
dres
sbe
ige
skim
pyw
rinkl
edso
iled
rear
over
head
uppe
rou
ter
blan
kfli
pdi
agon
alsq
uare
dna
rrow
ing
gapi
ngum
brel
lam
attie
red
roof
edm
usty
mol
dysm
ellin
gpu
trid
fishy
hoke
ypr
eten
tious
tack
ych
eesy
spoo
kysn
eaky
groo
vysm
oky
boun
cym
ello
wm
elod
ious
bogg
yw
ater
logg
edso
ggy
faul
tless
pass
able
moo
nlit
driz
zly
com
fycu
shy
stuf
fycr
ampe
dcl
utte
red
swan
kyse
edy
dank
clau
stro
phob
icfil
thy
smel
lyth
read
bare
shab
bydi
ngy
grim
ysp
arta
ndr
abra
msh
ackl
ede
crep
itup
hill
grue
lling
tort
uous
slee
ples
she
ctic
fran
ticov
erpo
wer
ing
suffo
catin
gun
rele
ntin
gbl
azin
gle
isur
ely
bris
kun
even
tful
even
tful
unim
pres
sive iffy
unsa
tisfy
ing
unan
noun
ced
atte
ndan
tno
nsto
pun
ders
taffe
dla
xou
tdat
edcu
mbe
rsom
ein
effic
ient
inop
erab
leop
erab
leun
atte
nded
unlo
cked
vent
ilate
dm
ovea
ble
mov
able
mot
oriz
edst
atio
nary
uncl
ean
ster
ilest
erili
zed
wat
ered
scru
bbed
patc
hed
hydr
ated
toni
cso
undp
roof
hydr
opon
iccr
uste
dpr
oofe
dtr
eed
dead
ened
slop
ped
thud
ding
vert
igin
ous
extr
aord
inai
rem
anqu
esl
ippy
pisc
ine
bare
foot
edof
fish
laxa
tive
swis
hre
dux
span
king
man
iac
repe
llent
repe
llant
espe
cial
copi
ous
fray
edso
ured
mar
red
chao
ticte
nse
heat
edsp
litdo
min
ated
dest
roye
dda
mag
edru
ined
wre
cked
shut
shut
tere
din
terr
upte
den
ded
brok
esw
ept
ham
mer
edst
ruck
susp
ende
dca
ncel
led
dela
yed
thre
aten
edco
ntin
ued
rene
wed
clos
ecl
osed
bloc
ked
seal
edco
ntro
lled
regu
late
dre
serv
edve
sted
tem
pora
ryva
cant
com
mun
alun
occu
pied
occu
pied
rain
ing
spla
shed
soak
edw
ashe
dlit
tere
dsc
atte
red
burn
ing
burn
tbu
rned
apar
tbr
oken
blow
nth
row
npa
rked
smas
hed
smas
hing
pres
erve
dne
glec
ted
over
look
edfo
rgot
ten
sunk
enm
angl
edgr
ave
dead
burie
dto
rnw
rapp
edw
ound
shad
ycr
ooke
dpa
ved
win
ding
bum
py dirt
dust
ym
uddy
roug
hsl
ippe
ry icy
faul
tyov
erlo
aded
leak
ysp
otty
shod
dysu
bsta
ndar
dun
even
tight
slac
kst
eep
flat
narr
owde
nse
thic
kla
rge
heav
yun
derg
roun
dpe
dest
rian
ragi
ngro
arin
gru
mbl
ing
cong
este
dcl
ogge
dflo
oded
over
flow
ing
priz
eea
rned
rece
ived
mill
ion
wor
thva
lued
stre
amin
gel
ectr
onic
digi
tal
inst
ant
onlin
ein
tera
ctiv
ebr
oadb
and
wire
less
mob
ilepo
rtab
leau
tom
ated
man
ager
ial
tech
nica
lfin
anci
alco
rpor
ate
prof
itabl
eco
mpe
titiv
eov
erse
asco
mm
erci
aldi
rect
stan
dby
bene
ficia
ryto
ken
supp
lem
enta
rysu
pple
men
tal
gene
ricpr
escr
iptio
nun
limite
dm
inim
umm
axim
umpr
emiu
mre
ntal
affo
rdab
leex
pens
ive
inex
pens
ive
chea
pco
nven
ient
acce
ssib
lesu
itabl
eid
eal
attr
activ
ede
sira
ble
rege
ntm
erid
ian
inte
rcon
tinen
tal
med
iterr
anea
nco
ntin
enta
lpa
cific
atla
ntic jet
plan
ela
nded
flyin
g flyin
trep
idch
arte
red
guid
edtr
acke
dsi
ghte
dex
pres
sou
tbou
ndna
tura
lm
iner
alor
ient
aloc
cide
ntal
elec
tric
alm
echa
nica
lm
otor
elec
tric
disc
over
edde
tect
edar
tific
ial
surf
ace
equi
pped
fitte
dm
ount
edpo
wer
edbu
iltde
sign
edm
odel
inte
grat
edde
velo
ped
oper
atio
nal
oper
atin
gdu
alfix
edex
istin
gco
nven
tiona
lad
vanc
ead
vanc
edin
crea
sed
expa
nded
impr
oved
enha
nced
mai
ntai
ned
limite
dex
tend
edin
sure
dpr
uden
tial
neut
ral
pola
rne
tpo
sted
leve
lhi
ghhi
gher
rais
edex
pect
edan
ticip
ated
low
est
over
all
com
bine
dre
vise
dpr
oces
sed
impo
rted
avai
labl
edi
strib
uted
cont
aine
dm
ater
ial
docu
men
ted
colle
cted
radi
oni
ghtly
daily
wee
kly
med
ium
tiny
smal
lla
rger
smal
ler
size
size
dvi
sibl
ech
arac
teris
ticdi
stin
ctiv
eun
ique
dist
inct
obvi
ous
surp
risin
gno
ticea
ble
subt
lest
rikin
gdr
amat
icle
sser
myr
iad
asso
rted
rang
ing
varie
dva
ryin
gtr
emen
dous
cons
ider
able
grea
ter
subs
tant
ial
sign
ifica
ntm
assi
veva
sthu
geen
orm
ous
wid
ede
epin
ner
rich
dive
rse
mis
sing
aliv
ecl
eare
dhe
aded
load
eddr
ivin
gw
alki
ngca
tchi
ngru
nnin
gsh
ort
long
right
free
mid
dle
hom
eha
lfpa
sttu
rned
away
back cu
tcu
tting
easy
quic
ksl
ow fast
pass
pass
ing
forw
ard
ahea
dbe
hind
pres
sed
pres
sing
mee
tpr
epar
edre
ady
mus
tw
illin
gno
ted
desc
ribed
ackn
owle
dged
stat
edpo
inte
dst
ress
edle
arne
dun
ders
tood
real
ized
telli
ngad
vise
daw
are
info
rmed
prov
edde
term
ined
neith
erco
nvin
ced
clea
rim
poss
ible
othe
rwis
ede
cidi
nglik
ely
unlik
ely
poss
ible
futu
rech
ance
enou
ghab
lest
illke
ptal
one
done
sure
goin
g goco
min
ggo
ne real big
bigg
erpr
etty
little
kind lik
eha
rdex
celle
ntqu
ality
best
good
bette
rfa
irfin
epr
otec
ted
prot
ectin
gcl
ean
safe
secu
recr
itica
lke
yim
port
ant
vita
lcr
ucia
lm
ain
maj
orle
adin
gac
tive
invo
lved
resp
onsi
ble
beco
min
gtr
ansf
orm
edas
sum
edas
sum
ing
pres
ent
curr
ent
mer
eze
roco
mpa
rabl
eeq
uiva
lent
chan
ged
chan
ging
sim
ilar
unus
ual
subj
ect
latte
run
like
norm
alty
pica
lus
ual
ever
yen
tire
who
lesu
ppos
edin
tend
edre
quire
dne
eded
nece
ssar
yus
edap
plie
dco
nnec
ted
sepa
rate
mul
tiple
indi
vidu
alre
late
dco
mm
ondi
ffere
ntsp
ecifi
cpa
rtic
ular
cert
ain
med
ical
patie
ntel
emen
tary
text
book
bilin
gual
expe
rimen
tal
mod
ern
cont
empo
rary
alte
rnat
ive
choi
cepr
efer
red
basi
cst
anda
rdtr
aditi
onal
cust
omre
side
nthi
red
retir
edtr
aine
dpr
ofes
sion
alam
ateu
rel
igib
lequ
alifi
edpr
ospe
ctiv
ese
lect
edse
lect
assi
gned
rank top
rank
ing
juni
orel
itere
cogn
ised
resp
ecte
dfo
rem
ost
outs
tand
ing
exem
plar
ycr
eativ
ear
tistic
wor
thy
ultim
ate
extr
aord
inar
yex
cept
iona
len
durin
gge
nuin
efa
mou
sle
gend
ary
grea
tgr
eate
stho
nore
dde
dica
ted
abso
lute
univ
ersa
lor
dina
ryev
eryd
aypu
resa
tisfy
ing
sim
ple
stra
ight
forw
ard
self
cons
ciou
spe
rson
alph
ysic
alhi
span
icam
eric
anna
tive
latin
asia
naf
rican
side
cros
sop
posi
teou
tsid
ein
side
fron
tst
andi
ngpr
ior
initi
alla
stea
rlier
early
late
rco
mpl
eted se
tfu
llco
mpl
ete
hist
oric
mar
ked
cele
brat
edco
nver
ted
settl
edre
loca
ted
inco
rpor
ated
esta
blis
hed
live
livin
gne
wne
xtbe
ginn
ing
star
ting
mov
edm
ovin
gw
orki
ngsp
ent
less fa
rev
enm
uch
wel
lm
ade
just
one
anot
her
trav
elle
dtr
avel
edat
tend
edga
ther
edfe
wer
leas
tno
neei
ght
nine
seve
nfiv
esi
xtw
ofo
urth
ree
twen
tyth
irty
ten
twel
vefif
teen
elev
endo
zen
hund
red
divi
ded
toge
ther
arra
nged
jew
ish
affil
iate
dfo
rmed
join
edna
tionw
ide
coor
dina
ted
orga
nise
dor
gani
zed
said
acco
rdin
gof
ficia
lco
nfir
med
repo
rted
natio
nal
euro
pean
inte
rnat
iona
lun
ion
unite
dap
poin
ted
elec
ted
repr
esen
tativ
ere
pres
ente
dse
nior
asso
ciat
eas
sist
ant
exec
utiv
ege
nera
lsc
hedu
led
plan
ned
augu
stop
ened
held
sign
sign
edap
prov
edag
reed
anno
unce
djo
int
initi
ativ
epr
esid
entia
lpr
emie
rpr
ime
indi
anca
nadi
anbr
itish
aust
ralia
nen
glis
hiri
shro
yal
colo
nial
vict
oria
nne
wbo
rnte
enag
ete
enag
edad
ult
fem
ale
mal
eyo
ung
olde
ryo
unge
rol
dbo
rnm
arrie
dm
exic
ancu
ban
italia
nbr
azili
ansp
anis
hpo
rtug
uese
japa
nese
gree
ktu
rkis
hgi
ant
owne
dso
ldba
sed
firm
boss ex
form
ersw
iss
fren
chbe
lgia
ndu
tch
dani
shge
rman
polis
h iiro
man
otto
man
prus
sian
met
ropo
litan
indu
stria
lre
side
ntia
lur
ban
capi
tal
nort
heas
tso
uthe
ast
nort
hso
uth
wes
tea
stce
ntra
lso
uthe
rnno
rthe
rnw
este
rnea
ster
nte
rmin
alpo
rtba
yba
sebu
sybu
stlin
gdo
wnt
own
upto
wn
plai
nbo
rder
edov
erlo
okin
gsu
rrou
ndin
gad
jace
ntsq
uare
loca
ted
situ
ated
peac
eful
mas
sex
pres
sed
voic
edfo
rthc
omin
gup
com
ing
rece
ntla
test
ongo
ing
cont
inui
ngci
vic
orie
nted
min
ded
frie
ndly
coop
erat
ive
unde
rsta
ndin
gsh
ared
shar
ing
dipl
omat
icst
rate
gic
hist
oric
alcu
ltura
lm
any
vario
usse
vera
lnu
mer
ous
popu
lar
know
nco
nsid
ered
loca
lpr
ivat
epu
blic
inde
pend
ent
lega
lce
nter
com
plex
hum
anso
cial
envi
ronm
enta
llo
ose
thin
soft
smoo
thtr
ansp
aren
tre
laxi
ngre
laxe
dac
com
mod
atin
gco
nditi
oned
fest
ive
invi
ting
wel
com
ing
wel
com
ecu
stom
ary
rout
ine
casu
alin
timat
eco
rrec
tw
rong
acce
ptab
lere
ason
able
sens
ible
prop
erap
prop
riate
prac
tical
obje
ctiv
ew
hats
oeve
rtr
uem
eani
ngre
gard
less
mea
nw
hate
ver
know
ing
proo
fde
finite
gear
edm
otiv
ated
enga
ged
enga
ging
accu
stom
edal
ike
look
ing
thin
king
inte
rest
edfo
cuse
dre
liabl
eac
cura
teca
refu
lth
orou
ghtim
ely
prom
ptsp
eedy
intu
itive
agile
econ
omic
alsu
ited
utili
zed
apt
pref
erab
lekn
owle
dgea
ble
repu
tabl
ere
spon
sive
hone
stco
mpe
tent
grow
ing
grow
nfr
esh
mix
edm
ildm
oder
ate
posi
tive
stro
ngco
nsis
tent
solid
caut
ious
mod
est
stea
dyov
erni
ght
quie
tca
lmsi
gnal
aler
tel
abor
ate
deta
iled
exte
nsiv
ebr
ief
leng
thy
form
alin
form
alfr
eque
ntco
nsta
ntpl
usex
tra
addi
tiona
lre
gula
rsp
ecia
lac
com
pani
edac
com
pany
ing
plac
edat
tach
edsu
mm
ary
supr
eme
unan
imou
sha
ndle
dad
dres
sed
reso
lved
fulfi
lled
relie
ved
assu
red
conf
iden
tpl
ease
dsa
tisfie
dem
pow
ered
supe
rvis
edas
sist
edpr
omis
ing
appe
alin
gde
man
ding
acce
pted
acce
ptin
gre
vers
ere
vers
eddr
awn
hand
edta
king
take
ngi
ven
givi
ngex
clus
ive
sole
gran
ted
requ
este
dau
thor
ized
perm
anen
tpa
rtia
lda
ted
liste
dun
know
nex
act
actu
alve
rifie
dco
nfir
min
gdi
sclo
sed
topn
otch
mod
erne
met
aliro
nco
ncre
test
one
votiv
eill
umin
ated lit
light
edca
rved
scul
pted
pain
ted
deco
rate
dad
orne
dha
ndm
ade
antiq
uest
aine
dpa
stel
arch
arch
edtil
edgl
azed
pane
led
orna
tedo
med
imm
acul
ate
cath
edra
lou
twar
din
land
fart
her
roun
ded
shel
tere
dtw
in tall
laid lay
nest
led
tuck
edsh
aped
etch
edsa
cred
myt
hica
lhe
aven
lydi
vine
mar
ian
elde
rno
ble
holly
woo
dst
arsw
ank
ruby
shin
ing
stel
lar
glow
ing
fain
tby
gone
bust
garg
antu
angi
gant
icpr
ecio
uspr
icel
ess
rare
exot
icru
stic
quai
ntpi
ctur
esqu
ese
clud
edid
yllic
posh
upsc
ale
upm
arke
tpa
risia
nch
icfa
shio
nabl
etr
endy
plus
hlu
xepa
latia
llu
xurio
usop
ulen
tsu
mpt
uous
scen
icpa
nora
mic
suns
etsu
nris
epl
easa
ntbr
eezy
sunn
ygo
thic
baro
que
arch
itect
ural
deco
rativ
ein
terio
rex
terio
rdo
wnh
illqu
adru
ple
doub
letr
iple
swim
min
gin
door
outd
oor
wor
ldol
ympi
cgo
lden
gold
silv
erw
onw
inni
ngro
und
final
finis
hed
stra
ight
open
gran
din
stru
men
tal
solo
pian
oba
ssso
und
fam
iliar
soun
ding
sing
ing
pop
mas
ter
mag
icm
agic
altw
entie
thin
spire
dre
min
isce
ntm
irror
edre
flect
edev
iden
tro
man
ticro
man
cead
apte
dno
vel
dire
cted
dire
ctin
gen
title
dw
ritte
nor
igin
alin
clud
edfe
atur
edst
raw
oval
iden
tical
mat
chin
gm
atch
edun
iform fit
fittin
gco
lor
light
dark
brig
htbl
ack
whi
tebr
own
gray
grey
yello
wpi
nkbl
uegr
een
curt
pitc
hed
play
edga
me
sing
lefir
stse
cond
third
elev
enth
nint
hfif
thsi
xth
seve
nth
capp
edtim
edop
enin
gcl
osin
gst
retc
hm
idw
ayam
icab
leco
rdia
lut
mos
tsi
ncer
elo
ving
right
eous
virt
uous
cons
cien
tious
com
pass
iona
tehe
arty
luke
war
mgr
acio
usho
spita
ble
resp
ectfu
lco
urte
ous
polit
edi
ligen
tat
tent
ive
solic
itous
cons
ider
ate
sani
tary
blan
ket
prot
ectiv
ebo
tani
cal
land
scap
edle
afy
shad
edae
robi
cdr
inka
ble
usab
lepl
entif
ulab
unda
ntad
equa
team
ple
satis
fact
ory
wor
kabl
epa
inle
ssto
lera
ble
wor
thw
hile
doab
lein
clus
ive
cond
uciv
eco
mpl
emen
tary
bene
ficia
lfle
xibl
eef
fect
ive
effic
ient cc
ster
eode
luxe
com
pact cd
adju
stab
letr
imop
tiona
lgr
oom
edw
eded
ucat
edw
ealth
yhe
alth
yhe
alth
ier
vibr
ant
cosm
opol
itan
tole
rant
civi
lized
win
dwar
dvi
rgin
cora
lca
nary
tusc
anve
netia
nne
apol
itan
flore
ntin
eja
mai
can
haw
aiia
ncr
eole
nigh
galo
repr
esto
spel
lbou
ndpr
icy
hom
ely
scru
mpt
ious
appe
tizin
gal
fres
copa
mpe
ring
uncr
owde
dun
clut
tere
dun
dist
urbe
dsp
arse
rest
ful
seda
teun
obtr
usiv
edi
scre
etse
rene
tran
quil
joyf
ulch
eerf
ulch
eery
ency
clop
edic
boun
dles
sun
para
llele
dun
surp
asse
dun
mat
ched
unbe
atab
lefla
wle
ssun
ders
tate
def
fort
less
impe
ccab
lesu
perla
tive
fluen
tsp
eaki
ngsp
oken
capa
ble
pow
erfu
lag
gres
sive
coun
ter
smar
tcl
ever
soph
istic
ated
inno
vativ
esl
imta
me
mar
gina
lm
eage
rhe
ftysi
zabl
eva
luab
lein
valu
able
hand
yhe
lpfu
lus
eful
spar
ew
aste
dsa
ved
savi
ngge
nero
uspa
idpa
ying
brav
ebo
ldfa
bulo
usfa
ntas
ticw
onde
rful
terr
ific
phen
omen
alas
toni
shin
gam
azin
gin
cred
ible
awes
ome
mar
velo
usm
arve
llous
glor
ious
maj
estic
mag
nific
ent
sple
ndid
spec
tacu
lar
stun
ning
brea
thta
king
dazz
ling
supe
rbbr
illia
ntim
pres
sive
rem
arka
ble
dese
rved
dece
ntre
spec
tabl
eha
ppy ok
com
fort
able
nice
perf
ect
seas
oned
acco
mpl
ishe
dta
lent
edap
prec
iativ
een
thus
iast
icim
pres
sed
appr
ecia
ted
fanc
ied
ente
rtai
ned
ador
edke
enav
idcu
rious
exci
ted
amaz
edde
light
edth
rille
dlo
ved
liked
fort
unat
elu
cky
grat
eful
glad
than
kful
hot
dry
salt
baki
ngce
real
crac
kers
pean
utflu
ffyva
nilla
cara
mel
blen
ded
pow
dere
dnu
tsto
aste
dst
eam
edro
ast
chic
ken
frie
dba
ked
grill
edco
oked
roas
ted
oliv
em
int
cher
ryho
ney
ripe
juic
yte
nder
swee
tsa
ltysa
vory
tast
yde
licio
usda
rling pa
tac
efa
vour
itefa
vorit
eou
tbac
ksu
per
min
icl
assi
cho
mem
ade
take
out
com
plim
enta
rype
rson
aliz
edin
flata
ble
plas
ticru
bber
disp
osab
lecu
shio
ned
wat
erpr
oof
rem
ovab
lead
agio
yum
my
velv
etin
digo
fros
ted
quilt
edto
ples
snu
dela
ughi
ngsm
iling
petit
epo
lishe
dex
quis
iteel
egan
tgr
acef
ulcr
isp
spar
klin
gbu
bbly
lush
pris
tine
colo
rful
beau
tiful
gorg
eous
love
lyta
nned
refr
eshe
dsp
otle
sssq
ueak
yro
omy
snug
cozy
cosy
glos
syst
ylis
hsl
eek
flash
ygl
amor
ous
pric
eyfa
ncy
skin
nyov
ersi
zed
shin
ysp
arkl
yfin
erflo
wer
yle
gibl
eai
ryex
pans
ive
encl
osed
furn
ishe
dsp
acio
usin
terc
onne
cted
augm
ente
dco
nfig
ured
ellip
tical
circ
ular
rect
angu
lar
disc
rete
func
tiona
lco
rres
pond
ing
defin
edpr
eset
desi
red
optim
alap
prox
imat
ein
ferio
rfe
min
ine
funn
yhu
mor
ous
ironi
cam
usin
ghi
lario
usth
ough
tful
pass
iona
teen
erge
ticliv
ely
spiri
ted
mem
orab
leun
forg
etta
ble
exci
ting
fasc
inat
ing
inte
rest
ing
intr
igui
ngin
form
ativ
een
tert
aini
ngen
joya
ble
auth
entic
timel
ess
fash
ione
dqu
inte
ssen
tial
funk
yre
tro
ecle
ctic
min
imal
ist
plea
sing
insp
iring
allu
ring
ench
antin
gm
esm
eriz
ing
char
min
gde
light
ful
quirk
yen
dear
ing
eart
hyso
othi
ngin
vigo
ratin
gre
fres
hing
who
leso
me
hom
eyta
stef
ulcl
assy
cree
pycu
tead
orab
lech
atty
pers
onab
leap
proa
chab
lem
anne
red
talk
ativ
egr
itty
neat
tidy
hum
ble
unas
sum
ing
unpr
eten
tious
02
46
810
1214
expectable
pickled
corned
stocked
stuffed
packaged rum
alaskan
salmon
lengthwise
coarse
uncooked
avocado
shredded
chopped
peeled
rotten fat
bottom
tops
topped
topping
frozen
dried
steaming
whipping
leftover
stale
boiled
sour
hungry
starving
exhausted
dehydrated
caring
tending
distressed
stricken
blind
deaf
disabled
homeless
elderly
aged
eighteen fifty
ninety
contracted
treated ill
pregnant
sick
dying
conjoined
dominican
returning
departed
stranded
rescued
mod
ultra
eponym
ous
hip
punk
secret
hidden
dirty fake
animal pet
wild
mad
colored
orange red
highland
crescent
midwe
stern
rural
unpopulated
undeveloped
untouched
inaccessible
wooded hilly
sprawling
populated
scrub
shallow
rocky
sandy
nautical
loco
sportive
tropical
torrential
cloudy
damp
wet
humid
rainy
windy
chilly
weather
cold
warm cool
hourly
annual
yearly
gross
total
minus
projected
surpassing
whopping
average
adjusted il xx
prepaid
mailed
postal
misrepresented
inflated
skewed
unfounded
misleading
false
dubious
questionable
incomplete
inconsistent
incorrect
sketchy
obscure
aforem
entioned
peculiar
seem
ing
confusing
ambiguous
problematic
unsolicited
unauthorised
forfeit
outright
heard
loud
yelled
shouted
scream
ing
crying
blaring
banging
attic
upstairs
downstairs
awake
asleep
waking
awakened
sleeping
breathing
otc ci
wee
wan
tan
bum
capitalist
orientated
lumbering
imperialistic
ethnic
foreign
domestic thai
indonesian
korean
vietnamese
chinese
masculine
gay
abusive
abused
harassed
communist
political
liberal
penal
mandatory
strict
forbidden
prohibited
violated
restricted
limiting
slight
downward
dim
faded
dashed
disappointing
dism
allackluster
subdued
upbeat
tepid
stock
dipped
rose fell
dropping
falling
risen
fallen
lowe
red
slashed
sterling
recovering
recovered
lost
beat
beaten
ranked
fourth
consecutive
missed
shot
tied
lone
minute
foul
frank
sometime
mid
late
ago
fired
dism
issed
sent
ordered
sufficient
lacking
minimal
excessive
excess
waste
raw
scarce
worst
blam
eblam
edstruggling
experienced
faced
hurt
bad
worse
poor
weak
affected
impacted
negative
favorable
dangerous
isolated
unstable
elevated
relative
low
reduced
bush
federal
internal
due
previous
following
subsequent
hearing
incident
superior
potent
formidable
probable
potential
apparent
serious
remote
neighboring
camp
near
nearby
scrambled
rushed
rush
waiting
stopped
collect
tried
trying
decided
wanted
attempted
forced
unable
remaining left
removed
abandoned
spread
found
covered
exposed
offensive
defensive
uniformed
loyal
armed
fighting
pro
anti
opposed
backed
explosive
unidentified
searching
spotted
automatic
warning
flash
killing
stabbing
executed
infamous
notorious
charged
guilty
suspect
criminal
alleged
petty
drunken
drunk
operative
cutaneous
facial
bone
minor
sustained
suffering
severe
bodily
mental
fossil
warming
magnetic
hydraulic
ambient
fluid
liquid
synthetic
additive
content
analog
runny
itchy
scratchy
edible
inedible
fragrant
potted
rust
rusty
powdery
mushy
greasy
flaky
sticky
watery
hairy
hollow
fuzzy
eyed buff
reddish
bluish
darkening
opaque
murky
okay
alright
dear
damn
remiss
napping
seasick
picky
brag
bragging
inquiring
discerning
tripping
lingual
biolum
inescent
equidistant
interlaced
schematic
derisive
audible
buzzing
whispering
upwind
buggy
huffy
toed
hooked
wired
plugged
fab
gabby
littler
bedded
looseleaf
tropic
pied
trojan
iberian
smoking
secondhand
addictive
alcoholic
gastrointestinal
allergic
asthmatic
sensitive
touchy
negotiable
obligatory
overdue
undecided
shy
inclined
worried
concerned
confused
unhappy
frustrated
hesitant
reluctant
unwilling
eager
anxious
skeptical
wary
advertised
billed
booked
unavailable
rested
disposed
compensated
graded
categorized
checked
sorted
farthest
distant
approaching
wandering
sneaking
touching
bouncing
crossed
halfway
stuck
sideways
dragging
sliding
driven
rolling
rolled
floating
drifting
tipped
stray
discrim
inatory
discrim
inating
intrusive
explicit
unsafe
impractical
unusable
poorly
ineffective
inadequate
consum
ing
costly
repetitive
needless
unnecessary
discouraged
apprehensive
leery
mum
adam
ant
disturbed
horrified
disgusted
appalled
dism
ayed
hysterical
terrified
distraught
stung
embarrassed
annoyed
irritated
bothered
complaining
thunderstruck
gobsmacked
flabbergasted
speechless
apologetic
incredulous
bewildered
baffled
perplexed
puzzled
unimpressed
disheartened
perturbed
offended
disliked
dazzled
enam
ored
pestered
patronized
lazy
kindly
wise
jolly sly
insane
incompetent
unresponsive
selfish
arrogant
naive
ignorant
uncaring
aloof
forgiving
squeam
ish
fussy
indulgent
obnoxious
overbearing
cocky
pushy
uptight
surly
sullen
clueless
grum
pylame
rude
downright
nasty
ugly
stupid
silly
weird
crazy
inexperienced
disorganized
disorganised
unimaginative
unmotivated
indifferent
uninterested
noisy
rowdy
unruly
disgruntled
irate
meagre
sizeable
exorbitant
extortionate
overpriced
tempting
enticing
limitless
lofty
manageable
unmanageable
bearable
intolerable
unbearable
aggravating
compounded
stressful
pleasurable
draining
taxing
spoilt
spoiled
dreary
miserable
lousy
mediocre
feeble
pathetic
pitiful
hurried
fumbling
clum
syhaphazard
perfunctory
halfhearted
wimpy
crappy
touristy
grouchy
crotchety
snooty
snotty
smallish
grungy
scruffy
constipated
thieving
grubby
dodgy
tatty
tipsy
skanky
slothful
bothersome
nauseating
yucky
lumpy
untidy
insubstantial
unappealing
unattractive
raunchy
trashy
tasteless
sham
eless
ostentatious
obtrusive
impersonal
unsupportive
unromantic
uninviting
repulsive
revolting
stereotyped
soulless
glorified
decadent
unsavory
harmless
insignificant
imaginable
serviceable
characterless
flavorless
existent
unmentionable
borderline
atrocious
deplorable
appalling
disgusting
sham
eful
disgraceful
inexcusable
repugnant
undesirable
disadvantageous
unhelpful
unfriendly
unprofessional
dishonest
inconsiderate
impolite
disrespectful
mistaken
ignored
contrary
justified
beforehand
realised
understandable
unavoidable
unfortunate
solved
corrected
rectified
flush
crack
cracked
interfering
obliged
obliging
reverting
bowing
system
atic
cleansing
irregular
transient
endless
countless
occasional
sporadic
evil
invisible
mysterious
sheer
utter
unbelievable
nonsense
outrageous
ludicrous
absurd
ridiculous
frightening
scary
bizarre odd
strange
eerie
chilling
ominous
futuristic
unreal
worrying
depressing
alarming
disturbing
troubling
horrid
hideous
ghastly
dreadful
horrendous
horrific
horrifying
terrible
horrible
awful
inconvenient
useless
pointless
boring
tedious
dull
bland
unpleasant
uncomfortable
awkward
unnerving
disconcerting
annoying
irritating
abrupt
unexpected
sudden
immediate
swift
stark
blunt
sharp
extreme
harsh
draconian
drastic
humiliating
embarrassing
shocking
hostile
unacceptable
inappropriate
disappointed
surprised
shocked
stunned
angered
angry
furious
answering
repeated
echoing
nervous
tired
bored
weary
impatient
sad
sorry
ashamed
upsetting
upset
tricky
challenging
difficult
tough
figured
wondering
noticed
seeing
afraid
scared
wanting
pretend
desperate
vain
unlucky
tiring
frustrating
packed
crowded
jammed
filled
empty
surrounded
lined
folding
wooden
draped
bearing
pictured
framed
stacked
strung
barefoot
legged
bare
naked
staring
seated
sitting
dressed
wearing
worn
dress
beige
skimpy
wrinkled
soiled
rear
overhead
upper
outer
blank
flip
diagonal
squared
narrowing
gaping
umbrella
mat
tiered
roofed
musty
moldy
smelling
putrid
fishy
hokey
pretentious
tacky
cheesy
spooky
sneaky
groovy
smoky
bouncy
mellow
melodious
boggy
waterlogged
soggy
faultless
passable
moonlit
drizzly
comfy
cushy
stuffy
cram
ped
cluttered
swanky
seedy
dank
claustrophobic
filthy
smelly
threadbare
shabby
dingy
grimy
spartan
drab
ramshackle
decrepit
uphill
gruelling
tortuous
sleepless
hectic
frantic
overpowe
ring
suffocating
unrelenting
blazing
leisurely
brisk
uneventful
eventful
unimpressive iffy
unsatisfying
unannounced
attendant
nonstop
understaffed
lax
outdated
cumbersom
einefficient
inoperable
operable
unattended
unlocked
ventilated
moveable
movable
motorized
stationary
unclean
sterile
sterilized
watered
scrubbed
patched
hydrated
tonic
soundproof
hydroponic
crusted
proofed
treed
deadened
slopped
thudding
vertiginous
extraordinaire
manque
slippy
piscine
barefooted
offish
laxative
swish
redux
spanking
maniac
repellent
repellant
especial
copious
frayed
soured
marred
chaotic
tense
heated split
dominated
destroyed
damaged
ruined
wrecked
shut
shuttered
interrupted
ended
broke
swept
hammered
struck
suspended
cancelled
delayed
threatened
continued
renewe
dclose
closed
blocked
sealed
controlled
regulated
reserved
vested
temporary
vacant
communal
unoccupied
occupied
raining
splashed
soaked
washed
littered
scattered
burning
burnt
burned
apart
broken
blown
thrown
parked
smashed
smashing
preserved
neglected
overlooked
forgotten
sunken
mangled
grave
dead
buried
torn
wrapped
wound
shady
crooked
paved
winding
bumpy dirt
dusty
muddy
rough
slippery icy
faulty
overloaded
leaky
spotty
shoddy
substandard
uneven
tight
slack
steep
flat
narrow
dense
thick
large
heavy
underground
pedestrian
raging
roaring
rumbling
congested
clogged
flooded
overflowing
prize
earned
received
million
worth
valued
stream
ing
electronic
digital
instant
online
interactive
broadband
wireless
mobile
portable
automated
managerial
technical
financial
corporate
profitable
competitive
overseas
commercial
direct
standby
beneficiary
token
supplementary
supplemental
generic
prescription
unlim
ited
minimum
maximum
prem
ium
rental
affordable
expensive
inexpensive
cheap
convenient
accessible
suitable
ideal
attractive
desirable
regent
meridian
intercontinental
mediterranean
continental
pacific
atlantic jet
plane
landed
flying fly
intrepid
chartered
guided
tracked
sighted
express
outbound
natural
mineral
oriental
occidental
electrical
mechanical
motor
electric
discovered
detected
artificial
surface
equipped
fitted
mounted
powe
red
built
designed
model
integrated
developed
operational
operating
dual
fixed
existing
conventional
advance
advanced
increased
expanded
improved
enhanced
maintained
limited
extended
insured
prudential
neutral
polar
net
posted
level
high
higher
raised
expected
anticipated
lowe
stoverall
combined
revised
processed
imported
available
distributed
contained
material
documented
collected
radio
nightly
daily
weekly
medium tiny
small
larger
smaller
size
sized
visible
characteristic
distinctive
unique
distinct
obvious
surprising
noticeable
subtle
striking
dram
atic
lesser
myriad
assorted
ranging
varied
varying
tremendous
considerable
greater
substantial
significant
massive
vast
huge
enormous
wide
deep
inner
rich
diverse
missing
alive
cleared
headed
loaded
driving
walking
catching
running
short
long
right
free
middle
home
half
past
turned
away
back cut
cutting
easy
quick
slow fast
pass
passing
forward
ahead
behind
pressed
pressing
meet
prepared
ready
must
willing
noted
described
acknowledged
stated
pointed
stressed
learned
understood
realized
telling
advised
aware
informed
proved
determined
neither
convinced
clear
impossible
otherwise
deciding
likely
unlikely
possible
future
chance
enough
able still
kept
alone
done
sure
going go
coming
gone real big
bigger
pretty
little
kind like
hard
excellent
quality
best
good
better
fair
fine
protected
protecting
clean
safe
secure
critical
key
important
vital
crucial
main
major
leading
active
involved
responsible
becoming
transformed
assumed
assuming
present
current
mere
zero
comparable
equivalent
changed
changing
similar
unusual
subject
latter
unlike
normal
typical
usual
every
entire
whole
supposed
intended
required
needed
necessary
used
applied
connected
separate
multiple
individual
related
common
different
specific
particular
certain
medical
patient
elem
entary
textbook
bilingual
experim
ental
modern
contem
porary
alternative
choice
preferred
basic
standard
traditional
custom
resident
hired
retired
trained
professional
amateur
eligible
qualified
prospective
selected
select
assigned
rank top
ranking
junior
elite
recognised
respected
foremost
outstanding
exem
plary
creative
artistic
worthy
ultim
ate
extraordinary
exceptional
enduring
genuine
famous
legendary
great
greatest
honored
dedicated
absolute
universal
ordinary
everyday
pure
satisfying
simple
straightforward
self
conscious
personal
physical
hispanic
american
native
latin
asian
african
side
cross
opposite
outside
inside
front
standing
prior
initial
last
earlier
early
later
completed set
full
complete
historic
marked
celebrated
converted
settled
relocated
incorporated
established
live
living
new
next
beginning
starting
moved
moving
working
spent
less far
even
much
well
made
just
one
another
travelled
traveled
attended
gathered
fewe
rleast
none
eight
nine
seven
five six
two
four
three
twenty
thirty
ten
twelve
fifteen
eleven
dozen
hundred
divided
together
arranged
jewish
affiliated
formed
joined
nationw
ide
coordinated
organised
organized
said
according
official
confirm
edreported
national
european
international
union
united
appointed
elected
representative
represented
senior
associate
assistant
executive
general
scheduled
planned
august
opened
held
sign
signed
approved
agreed
announced
joint
initiative
presidential
prem
ier
prime
indian
canadian
british
australian
english
irish
royal
colonial
victorian
newborn
teenage
teenaged
adult
female
male
young
older
younger
old
born
married
mexican
cuban
italian
brazilian
spanish
portuguese
japanese
greek
turkish
giant
owned
sold
based
firm
boss ex
former
swiss
french
belgian
dutch
danish
german
polish ii
roman
ottoman
prussian
metropolitan
industrial
residential
urban
capital
northeast
southeast
north
south
west
east
central
southern
northern
western
eastern
terminal
port
bay
base
busy
bustling
downtown
uptown
plain
bordered
overlooking
surrounding
adjacent
square
located
situated
peaceful
mass
expressed
voiced
forthcoming
upcoming
recent
latest
ongoing
continuing
civic
oriented
minded
friendly
cooperative
understanding
shared
sharing
diplom
atic
strategic
historical
cultural
many
various
several
numerous
popular
known
considered
local
private
public
independent
legal
center
complex
human
social
environm
ental
loose
thin
soft
smooth
transparent
relaxing
relaxed
accommodating
conditioned
festive
inviting
welcom
ing
welcom
ecustom
ary
routine
casual
intim
ate
correct
wrong
acceptable
reasonable
sensible
proper
appropriate
practical
objective
whatsoever
true
meaning
regardless
mean
whatever
knowing
proof
definite
geared
motivated
engaged
engaging
accustom
edalike
looking
thinking
interested
focused
reliable
accurate
careful
thorough
timely
prom
ptspeedy
intuitive
agile
econom
ical
suited
utilized apt
preferable
knowledgeable
reputable
responsive
honest
competent
growing
grown
fresh
mixed
mild
moderate
positive
strong
consistent
solid
cautious
modest
steady
overnight
quiet
calm
signal
alert
elaborate
detailed
extensive brief
lengthy
formal
informal
frequent
constant
plus
extra
additional
regular
special
accompanied
accompanying
placed
attached
summary
suprem
eunanimous
handled
addressed
resolved
fulfilled
relieved
assured
confident
pleased
satisfied
empowe
red
supervised
assisted
prom
ising
appealing
demanding
accepted
accepting
reverse
reversed
drawn
handed
taking
taken
given
giving
exclusive sole
granted
requested
authorized
permanent
partial
dated
listed
unknown
exact
actual
verified
confirm
ing
disclosed
topnotch
moderne
metal
iron
concrete
stone
votive
illuminated lit
lighted
carved
sculpted
painted
decorated
adorned
handmade
antique
stained
pastel
arch
arched
tiled
glazed
paneled
ornate
domed
immaculate
cathedral
outward
inland
farther
rounded
sheltered
twin tall
laid lay
nestled
tucked
shaped
etched
sacred
mythical
heavenly
divine
marian
elder
noble
hollywo
od star
swank
ruby
shining
stellar
glowing
faint
bygone
bust
gargantuan
gigantic
precious
priceless
rare
exotic
rustic
quaint
picturesque
secluded
idyllic
posh
upscale
upmarket
parisian
chic
fashionable
trendy
plush
luxe
palatial
luxurious
opulent
sumptuous
scenic
panoramic
sunset
sunrise
pleasant
breezy
sunny
gothic
baroque
architectural
decorative
interior
exterior
downhill
quadruple
double
triple
swimming
indoor
outdoor
world
olym
pic
golden
gold
silver
won
winning
round
final
finished
straight
open
grand
instrumental
solo
piano
bass
sound
familiar
sounding
singing
pop
master
magic
magical
twentieth
inspired
reminiscent
mirrored
reflected
evident
romantic
romance
adapted
novel
directed
directing
entitled
written
original
included
featured
straw
oval
identical
matching
matched
uniform fit
fitting
color
light
dark
bright
black
white
brown
gray
grey
yellow
pink
blue
green
curt
pitched
played
game
single
first
second
third
eleventh
ninth
fifth
sixth
seventh
capped
timed
opening
closing
stretch
midway
amicable
cordial
utmost
sincere
loving
righteous
virtuous
conscientious
compassionate
hearty
lukewarm
gracious
hospitable
respectful
courteous
polite
diligent
attentive
solicitous
considerate
sanitary
blanket
protective
botanical
landscaped
leafy
shaded
aerobic
drinkable
usable
plentiful
abundant
adequate
ample
satisfactory
workable
painless
tolerable
worthwhile
doable
inclusive
conducive
complem
entary
beneficial
flexible
effective
efficient cc
stereo
deluxe
compact cd
adjustable
trim
optional
groomed wed
educated
wealthy
healthy
healthier
vibrant
cosm
opolitan
tolerant
civilized
windward
virgin
coral
canary
tuscan
venetian
neapolitan
florentine
jamaican
hawaiian
creole
nigh
galore
presto
spellbound
pricy
homely
scrumptious
appetizing
alfresco
pampering
uncrowded
uncluttered
undisturbed
sparse
restful
sedate
unobtrusive
discreet
serene
tranquil
joyful
cheerful
cheery
encyclopedic
boundless
unparalleled
unsurpassed
unmatched
unbeatable
flawless
understated
effortless
impeccable
superlative
fluent
speaking
spoken
capable
powe
rful
aggressive
counter
smart
clever
sophisticated
innovative
slim
tame
marginal
meager
hefty
sizable
valuable
invaluable
handy
helpful
useful
spare
wasted
saved
saving
generous
paid
paying
brave
bold
fabulous
fantastic
wonderful
terrific
phenom
enal
astonishing
amazing
incredible
awesom
emarvelous
marvellous
glorious
majestic
magnificent
splendid
spectacular
stunning
breathtaking
dazzling
superb
brilliant
impressive
remarkable
deserved
decent
respectable
happy ok
comfortable
nice
perfect
seasoned
accomplished
talented
appreciative
enthusiastic
impressed
appreciated
fancied
entertained
adored
keen avid
curious
excited
amazed
delighted
thrilled
loved
liked
fortunate
lucky
grateful
glad
thankful hot
dry
salt
baking
cereal
crackers
peanut
fluffy
vanilla
caramel
blended
powdered
nuts
toasted
steamed
roast
chicken
fried
baked
grilled
cooked
roasted
olive mint
cherry
honey
ripe
juicy
tender
sweet
salty
savory
tasty
delicious
darling pat
ace
favourite
favorite
outback
super
mini
classic
homem
ade
takeout
complimentary
personalized
inflatable
plastic
rubber
disposable
cushioned
waterproof
removable
adagio
yummy
velvet
indigo
frosted
quilted
topless
nude
laughing
smiling
petite
polished
exquisite
elegant
graceful
crisp
sparkling
bubbly
lush
pristine
colorful
beautiful
gorgeous
lovely
tanned
refreshed
spotless
squeaky
room
ysnug
cozy
cosy
glossy
stylish
sleek
flashy
glam
orous
pricey
fancy
skinny
oversized
shiny
sparkly
finer
flowe
rylegible
airy
expansive
enclosed
furnished
spacious
interconnected
augm
ented
configured
elliptical
circular
rectangular
discrete
functional
corresponding
defined
preset
desired
optim
alapproximate
inferior
feminine
funny
humorous
ironic
amusing
hilarious
thoughtful
passionate
energetic
lively
spirited
mem
orable
unforgettable
exciting
fascinating
interesting
intriguing
informative
entertaining
enjoyable
authentic
timeless
fashioned
quintessential
funky
retro
eclectic
minimalist
pleasing
inspiring
alluring
enchanting
mesmerizing
charming
delightful
quirky
endearing
earthy
soothing
invigorating
refreshing
wholesome
homey
tasteful
classy
creepy
cute
adorable
chatty
personable
approachable
mannered
talkative
gritty
neat
tidy
humble
unassuming
unpretentious
Figure 9: Trees for 2, 397 adjectives on the leaves with branches colored based on β estimatedwith the lasso (Top), L1-ag-h (Middle) and our method (Bottom), respectively. Red branch,blue branch and gray branch correspond to negative, positive and zero βj, respectively.
Darker color indicates larger magnitude of βj and lighter color indicates smaller magnitude
of βj. The horizontal line shown in the middle plot corresponds to the height, chosen fromCV, at which L1-ag-h cuts the tree and merges the resulting branches.
0.05 0.10 0.20 0.50 1.00 2.00 5.00 10.00 20.00 50.00
0.0
0.2
0.4
0.6
0.8
% of Reviews Using Adjective
|bet
a|
Our MethodLassoL1−ag−h
Figure 10: Magnitudes of coefficient estimates |βj| versus feature density (on log scale) forour method (black circles), the lasso (red triangles) and L1-ag-h (green diamond).
19
![Page 20: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/20.jpg)
between our method and L1-ag-h. Inspection of the merged branches from our methodreveals words of similar meaning and sentiment being aggregated. In Figure 10, we plot|βj| against the percentage of reviews containing an adjective. We find that our methodselects rare words more than the other two methods. The rarest word selected by the lassois “filthy”, which appears in 0.47% of reviews. By contrast, our method selects many wordsthat are far more rare: at the extreme of rarest words, our method selects 797 words thatappear in only 0.059% of reviews; L1-ag-h selects 31 out of these 797 rarest words. Ourmethod is able to select more rare words through aggregation: it aggregates 2,244 words into224 clusters, leaving the remaining 153 words as singletons. Over 70% of these singletonsare dense words (where, for this discussion, we call a word “dense” if it appears in at least1% of reviews and “rare” otherwise). This is four times higher than the percentage of densewords in the original training data. Of the 224 aggregated clusters, 42% are made up entirelyof rare words. After aggregation, over half of these clusters become dense features. As acomparison, L1-ag-h aggregates 1,339 words into 506 clusters while keeping 1,058 wordsas singletons. Only 10% of these singletons from L1-ag-h are dense words. Of the 506aggregated clusters from L1-ag-h, 68% are made by rare words only, among which only 15%become dense clusters after aggregation.
Table 2 shows the density and estimated coefficient values for eight words falling in aparticular subtree of T . The words “heard” and “loud” occur far more commonly than theother six words. We see that the lasso only selects these two words whereas our methodselects all eight words (assigning them negative coefficient values). In contrast, L1-ag-h,while also aggregating the two densest words in this branch into a group, does not select thesix rare words. Examining the six rare words, it seems quite reasonable that they shouldreceive negative coefficient values. This suggests to us that the lasso’s exclusion of thesewords has to do with their rareness in the dataset rather than their irrelevance to predictingthe hotel rating.
adjectives heard loud yelled shouted screaming crying blaring bangingdensity 4 0.0300 0.0235 0.0006 0.0006 0.0029 0.0006 0.0006 0.0041
βlassoλ -0.057 -0.147 0 0 0 0 0 0
βL1−ag−hλ,height -0.174 -0.174 0 0 0 0 0 0
βoursλ -0.128 -0.128 -0.039 -0.039 -0.039 -0.039 -0.039 -0.039
Table 2: Term density and estimated coefficient for adjectives in the selected group
Existing work in sentiment analysis uses side information in other ways to improve pre-dictive performance (Thelwall et al., 2010). For example, Wang et al. (2016), working in anSVM framework, forms a directed graph over words that expresses their “sentiment strength”and then requires that the coefficients corresponding to these words honor the ordering thatis implied by the directed graph. For example if word A is stronger (in expressing a certainsentiment) than word B, which in turn is stronger than word C, then their method enforcesβA ≥ βB ≥ βC . Such constraints can in some situations have a regularizing effect thatmay be of use in rare feature settings: for example, if βA ≈ βC and B is a rare word, this
4The term density is computed over the training set.
20
![Page 21: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/21.jpg)
constraint would help pin down βB’s value. However, if βA and βC are very different fromeach other, the constraint may offer little help in reducing the variance of the estimate ofβB. Another difference is that the method uses hard constraints that are not controlled bya tuning parameter, so that even when there is strong evidence in the data that a constraintshould be violated, this will not be allowed. By contrast, our method shrinks toward theconstant-on-subtrees structure without forcing this to be the case.
Mean Squared Prediction Errortree setting our method L1 L1-dense L1-ag-h L1-ag-d
GloVe-50d 0.748 0.749 0.773 0.752 0.775GloVe-100d 0.742 0.749 0.773 0.747 1.173GloVe-200d 0.741 0.749 0.773 0.747 1.140ELMo 0.669 0.676 0.732 0.685 0.783
Table 3: Performance of five methods under various tree settings on the held-out test set.Among the tree settings, GloVe-50d, GloVe-100d and GloVe-200d correspond to hierarchicalclustering trees generated with GloVe embeddings of differing dimensions. We collapse overtwo million ELMo embedding vectors of adjectives into 6,001 clusters using mini-batch K-Means clustering (Sculley, 2010), then generate a hierarchical tree upon the 6,001 clustercentroids. The performance is based on 33,997 training reviews (i.e., corresponding to the20% row of Table 1).
Section 5 investigated the effect of using a distorted tree (see the bottom left panel ofFigure 5). In the context of words there are multiple choices of trees one could use. Table3 shows the effect of applying our method to different types of trees. Here we focus on onesituation with 33,997 training reviews. We first consider the effect of changing the dimensionof the GloVe embedding. One might suppose that as the embedding dimension increases thetree becomes more informative. Indeed, one finds that our method and L1-ag-h achieveimproved performance as this dimension increases. The last method, L1-ag-d, has morevariability in its performance as the GloVe tree varies, making it the poorest performingmethod. Both L1 and L1-dense do not make use of the tree and thus they are unaffected bychanges to the embedding dimension. But a limitation of the methods is that they all arebased on the bag-of-words model, and therefore do not take word context into consideration.For example, the models do not differentiate between the use of “bad” in “the hotel is bad”versus “the hotel is not that bad”. To bring context into consideration, we leverage deepcontextualized word representations from ELMo (Peters et al., 2018) to generate an auxiliarytree (See Appendix K for details). Unlike traditional word embeddings such as GloVe thatassociate one embedding per word, deep word embeddings can capture the meaning of eachword based on the surrounding context. From Table 3 we find test errors improve substan-tially with the ELMo tree for our method, L1 and L1-ag-h. Among all methods and all treesettings, our method with the ELMo tree performs the best. This suggests our method canbetter leverage the power of contextualized word embeddings than competing methods.
21
![Page 22: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/22.jpg)
7 Conclusion
In this paper, we focus on the challenge posed by highly sparse data matrices, which havebecome increasingly common in many areas, including biology and text mining. While muchwork has focused on addressing the challenges of high dimensional data, relatively little at-tention has been given to the challenges of sparsity in the data. We show, both theoreticallyand empirically, that not explicitly accounting for the sparsity in the data hurts one’s pre-diction errors and one’s ability to perform feature selection. Our proposed method is able tomake productive use of highly sparse features by creating new aggregated features based onside information about the original features. In contrast to simpler tree-based aggregationstrategies that are occasionally used as a pre-processing step in biological applications, ourmethod adaptively learns the feature aggregation in a supervised manner. In doing so, ourmethodology not only overcomes the challenges of data sparsity but also produces featuresthat may be of greater relevance to the particular prediction task of interest.
Acknowledgments
The authors thank Andy Clark for calling our attention to the challenge of rare features.This work was supported by NSF CAREER grant, DMS-1653017.
Appendices
A Failure of OLS in the Presence of A Rare Feature
Theorem 5. Consider the linear model (1) with X ∈ Rn×p having full column rank. Furthersuppose that Xj is a binary vector having k nonzeros. It follows that
P(∣∣∣βOLS
j (n)− β∗j∣∣∣ > η
)≥ 2Φ
(−ηk1/2/σ
)for any η > 0, (7)
where Φ(·) is the cumulative distribution function of a standard normal variable.
Proof. The distribution of the OLS estimator is βOLSj (n) ∼ N(β∗j , σ
2[(XTX)−1]jj). Byapplying blockwise inversion (see, e.g., Bernstein 2009), with the jth row/column of XTXin its own “block”, we get
[(XTX)−1]jj = [XTj Xj −XT
j X−j(XT−jX−j)
−1XT−jXj]
−1
= [‖Xj‖2 − ‖(XT−jX−j)
−1/2XT−jXj‖2]−1
≥ ‖Xj‖−2 = k−1.
Thus,
P(∣∣∣βOLS
j (n)− β∗j∣∣∣ > η
)= 2Φ
(− η
σ√
[(XTX)−1]jj
)≥ 2Φ
(−ηk1/2/σ
)where Φ(·) is the distribution function of a standard normal variable.
22
![Page 23: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/23.jpg)
B Proof of Theorem 2
In the setting of Theorem 2, we have Xβ∗ = Xβ∗, where X = X(Ik ⊗ 1p/k) ∈ Rn×k. The
two estimators, the oracle lasso on the aggregated data (X) and the lasso on the originaldata (X), are defined below.
• Oracle lasso estimator βoracleλ = βoracleλ ⊗ 1p/k where βoracleλ is the unique solution to
minβ∈Rk
1
2n‖y − Xβ‖2
2 + λ‖β‖1.
• Lasso estimator βlassoλ is defined in (2).
We begin by establishing that the interval I is nonempty, which is ensured by the con-straint k < p/(36 log n). In particular, the lower bound of the interval is below the upperbound if
12k log(k2n) < p log(2cp/k)
Now, log(k2n) ≤ 3 log n and log(2cp/k) ≥ 1 as long as 2cp/k > e. Now, 2c > 0.6, so ifp/k ≥ 5 then it would suffice to show that
36k log n < p.
And this constraint does imply that p/k ≥ 5. The two parts of the theorem follow from thefollowing two propositions.
Proposition 2 (Support recovery of oracle lasso). Suppose mini=1,...,k−1
∣∣∣β∗i ∣∣∣ > σ√
4kn
log(k2n).
With λ = σ√
log(k2n)kn
, the oracle lasso recovers the correct signed support successfully:
limn→∞
P(S±(βoracleλ ) = S±(β∗)
)= 1.
Proof. The scaled matrix√
knX is orthogonal since
XTX = (Ik ⊗ 1p/k)TXTX(Ik ⊗ 1p/k) =
n
p· pkIk =
n
kIk.
Orthogonality implies that
βoracleλ = S
((√k
nXT
)(√k
ny
), λk
)= S
(k
nXTy, λk
)(8)
where knXTy = k
nXTXβ∗ + k
nXTε = β∗ + k
nXTε ∼ Nk(β
∗, kσ2
nIk). By the Chernoff bound
for normal variables, for any t > 0,
P(∣∣∣∣kn(Xj)
Ty − β∗j∣∣∣∣ > t
)≤ 2 exp
(− t2
2kσ2/n
)for j = 1, . . . , k.
23
![Page 24: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/24.jpg)
Choosing t = σ√
k log(k2n)n
and applying a union bound yields
P
(∥∥∥∥knXTy − β∗∥∥∥∥∞> σ
√k log(k2n)
n
)≤ 2k exp
(−σ
2k log(k2n)/n
2kσ2/n
)=
2√n.
Hence, with probability at least 1− 2√n, we have
∥∥∥ knXTy − β∗∥∥∥∞≤ σ
√k log(k2n)
n= λk, due
to our choice of λ = σ√
log(k2n)kn
. Under∥∥∥ knXTy − β∗
∥∥∥∞≤ λk, the following results hold.
• By β∗k = 0 and∣∣∣∣kn(Xk)Ty
∣∣∣∣ =
∣∣∣∣kn(Xk)Ty − β∗k
∣∣∣∣ ≤ ∥∥∥∥knXTy − β∗∥∥∥∥∞≤ λk,
we have βoracleλ,k = S(kn(Xk)
Ty, λk)
= 0 and βoracleλ,` = βoracleλ,k = β∗k = β∗` for all
` > k−1kn.
• For j = 1, . . . , k − 1, since∣∣∣ kn(Xj)
Ty − β∗j∣∣∣ ≤ λk and
∣∣∣β∗j ∣∣∣ ≥ mini=1,...,k−1
∣∣∣β∗i ∣∣∣ > 2λk, we
must have kn(Xj)
Ty and β∗j share the same sign. Moreover, we either have∣∣∣∣kn(Xj)Ty
∣∣∣∣ ≥ ∣∣∣β∗j ∣∣∣ > λk
or∣∣∣ kn(Xj)
Ty∣∣∣ < ∣∣∣β∗j ∣∣∣ in which case
∣∣∣ kn(Xj)Ty − β∗j
∣∣∣ =∣∣∣∣∣∣ kn(Xj)
Ty∣∣∣− ∣∣∣β∗j ∣∣∣∣∣∣ =
∣∣∣β∗j ∣∣∣ −∣∣∣ kn(Xj)Ty∣∣∣ ≤ λk and therefore∣∣∣∣kn(Xj)
Ty
∣∣∣∣ ≥ ∣∣∣β∗j ∣∣∣− λk ≥ 2λk − λk = λk.
Thus,∣∣∣ kn(Xj)
Ty∣∣∣ > λk for j = 1, . . . , k − 1. By definition of βoracleλ and (8), for
j−1kn < ` ≤ j
kn,
βoracleλ,` = βoracleλ,j = S
(k
n(Xj)
Ty, λk
)=k
n(Xj)
Ty
1− λk∣∣∣ kn(Xj)Ty∣∣∣
which is of the same sign as β∗j (and the same sign as β∗` ).
In the above two bullet points, we have shown S±(βoracleλ ) = S±(β∗) holds with probabilityat least 1− 2√
n. Hence,
lim infn→∞
P(S±(βoracleλ ) = S±(β∗)
)≥ lim
n→∞1− 2√
n= 1.
Since lim supn→∞ P(S±(βoracleλ ) = S±(β∗)
)= 1, the limit for P
(S±(βoracleλ ) = S±(β∗)
)is
1.
24
![Page 25: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/25.jpg)
Lemma 2. Suppose ε ∼ Nn(0, σ2In) and c = 13e(π/2+2)−1
√14
+ 1π
. Then
P(
max`=1,...,p
|XT` ε| ≤ 2σ
√n
3plog(2cp)
)≤(
1− 1
p
)p.
Proof. Let Z be a standard Gaussian variable. Theorem 2.1 of Cote et al. (2012) provides alower bound for the Gaussian Q function (i.e., P(Z > z)). Choosing κ = 3
2in their Theorem
2.1 yields
P(Z > z) ≥
(1
3e(π/2+2)−1
√1
4+
1
π
)︸ ︷︷ ︸
c
e−3z2
4
where c = 13e(π/2+2)−1
√14
+ 1π
is independent of z. Given that X` has n/p one’s and X1p =
1n, we have XT` ε
iid∼ N(0, npσ2) for ` = 1, . . . , p. By expressing XT
1 ε =√
npσZ, we have for
any η > 0,
P(XT1 ε > η) ≥ ce−
3η2p
4σ2n ⇒ P(|XT1 ε| > η) ≥ 2ce−
3η2p
4σ2n .
Moreover,
P(
max`=1,...,p
|XT` ε| ≤ η
)=(P(|XT
1 ε| ≤ η))p
=(1− P
(|XT
1 ε| > η))p ≤ (1− 2ce−
3η2p
4σ2n
)p.
Plugging in η = 2σ√
n3p
log (2cp) in the above inequality yields
P(
max`=1,...,p
|XT` ε| ≤ 2σ
√n
3plog (2cp)
)≤(
1− 1
p
)p.
Proposition 3 (Failure of support recovery of lasso). Suppose mini=1,...,k−1
∣∣∣β∗i ∣∣∣ ≤ σ√
p3n
log (2cp/k)
where c = 13e(π/2+2)−1
√14
+ 1π
. The lasso fails to get high-probability signed support recovery:
lim supp→∞
supλ≥0
P(S±(βlassoλ ) = S±(β∗)
)≤ 1
e.
Proof. The lasso solution can be simplified to βlassoλ = S( pnXTy, λp). Since β∗` 6= 0 for
` ≤ k−1kp and β∗` = 0 for ` > k−1
kp, the following is a necessary condition for βlassoλ to recover
the correct signed support:
∃ λ s.t. |XT` y| > λp for ` ≤ k − 1
kp and |XT
` y| ≤ λp for ` >k − 1
kp
⇔
min`≤ k−1
kp|XT
` y| > max`> k−1
kp|XT
` y|.
25
![Page 26: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/26.jpg)
Furthermore, we have
XT` y = XT
`
(p∑j=1
Xjβ∗j + ε
)=n
pβ∗` +XT
` ε.
Define i := arg mini=1,...,k−1
∣∣∣β∗i ∣∣∣ and A :=
max`> k−1
kp|XT
` ε| ≤ 2σ√
n3p
log(2cp/k)
. Then
P(S±(βlassoλ ) = S±(β∗)
)≤P
(min`≤ k−1
kp|XT
` y| > max`> k−1
kp|XT
` y|
)
≤P
(min
i−1kp<`≤ i
kp
|XT` y| > max
`> k−1kp|XT
` y|
)
≤P
(n
p
∣∣∣β∗i ∣∣∣+ mini−1kp<`≤ i
kp
|XT` ε| > max
`> k−1kp|XT
` ε|
)
=P
(n
p
∣∣∣β∗i ∣∣∣+ mini−1kp<`≤ i
kp
|XT` ε| > max
`> k−1kp|XT
` ε| and Ac)
+ P
(n
p
∣∣∣β∗i ∣∣∣+ mini−1kp<`≤ i
kp
|XT` ε| > max
`> k−1kp|XT
` ε|∣∣∣A) · P(A)
≤P
(n
p
∣∣∣β∗i ∣∣∣+ mini−1kp<`≤ i
kp
|XT` ε| > 2σ
√n
3plog(2cp/k)
)+ P (A)
=
[P(|XT
1 ε| > 2σ
√n
3plog(2cp/k)− n
p
∣∣∣β∗i ∣∣∣)] pk + P (A) (by XT` ε being i.i.d.)
≤[P(|XT
1 ε| > σ
√n
3plog(2cp/k)
)] pk
+ P (A)
(by∣∣∣β∗i ∣∣∣ ≤ σ
√p
3nlog(2cp/k)
)≤[2 exp
(− p
2σ2n· σ
2n
3plog (2cp/k)
)] pk
+
(1− k
p
) pk
(by Chernoff ineq and Lemma 2)
≤2pk exp
(− p
6klog (2cp/k)
)+
(1− k
p
) pk
=2pk
(2cp
k
)− p6k
+
(1− k
p
) pk
=
(cp
32k
)− p6k
+
(1− k
p
) pk
which holds for all λ ≥ 0. In particular,
supλ≥0
P(S±(βlassoλ ) = S±(β∗)
)≤(cp
32k
)− p6k
+
(1− k
p
) pk
.
26
![Page 27: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/27.jpg)
Taking lim sup on both side yields
lim supp→∞
supλ≥0
P(S±(βlassoλ ) = S±(β∗)
)≤ lim
p→∞
(cp
32k
)− p6k
+ limp→∞
(1− k
p
) pk
= 0 +1
e=
1
e.
C Proof of Theorem 3
Let A ∈ 0, 1p×|C| denote the aggregation matrix corresponding to the partition C. Ag-
gregated least squares uses the design matrix X = XA, where each column is the sum offeatures in a group of C. Left-multiplication by A maps this back to p-dimensional space:
βC = AX+y.
Its fitted values are
XβC = XX+y = XX+(Xβ∗ + ε).
Now, X is full-rank and
XβC ∼ Np(XX+Xβ∗, σ2X[XTX]−1XT ).
Now,
E[XβC] = XX+Xβ∗
= XX+XAA+β∗ + XX+X(Ip −AA+)β∗
= XAA+β∗ + XX+X(Ip −AA+)β∗
since XAA+β∗ is in the column space of X, and
E[XβC]−Xβ∗ = X(AA+ − Ip)β∗ + XX+X(Ip −AA+)β∗
= (XX+ − In)X(Ip −AA+)β∗
E‖XβC −Xβ∗‖2 = ‖(XX+ − In)X(Ip −AA+)β∗‖2 + σ2tr(X[XTX]−1XT )
= ‖(XX+ − In)X(Ip −AA+)β∗‖2 + σ2|C|≤ ‖X‖2
op‖(Ip −AA+)β∗‖2 + σ2|C|
since ‖(XX+ − In)‖op = 1 and the trace of a projection matrix is the rank of the targetspace. Finally, observe that
‖(Ip −AA+)β∗‖2 =
|C|∑`=1
∑j∈C`
(β∗j − |C`|−1
∑j′∈C`
β∗j′
)2
.
27
![Page 28: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/28.jpg)
D Consensus ADMM for Solving Problem (5)
The ADMM algorithm is given in Algorithm 1. Let X = SVDcompact(U , D, V ) be the
compact singular value decomposition of X, where D ∈ Rmin(n,p)×min(n,p) is a diagonal matrixwith non-zero singular values on the diagonal, and U ∈ Rn×min(n,p) and V ∈ Rp×min(n,p)
contain the left and right singular vectors in columns corresponding to non-zero singularvalues, respectively. Similarly, we have (Ip : −A) = SVDcompact(·, ·, Q) where Q ∈ R(p+|T |):p
contains p right singular vectors correponding to non-zero singular values.
Algorithm 1 Consensus ADMM for Solving Problem (5)
Input: y,X,A, n, p, |T |, λ, α, ρ, εabs, εrel, maxite.
1: X = SVDcompact(·, D, V )
2: (Ip : −A) = SVDcompact(·, ·, Q)
3: β0 ← β(i)0 ← v(i)0 ← 0 ∈ Rp ∀i = 1, 2, 3
4: γ0 ← γ(j)0 ← u(j)0 ← 0 ∈ R|T | ∀j = 1, 2
5: continue← true
6: k ← 0
7: while k < maxite and continue do
8: k ← k + 1
9: β(1)k ←[V diag
(1
[DT D]ii+nρ
)V T + 1
nρ(Ip − V V T )
] (XTy + nρβk−1 − nv(1)k−1
)10: β
(2)kj ← S
(βk−1j − 1
ρv
(2)k−1j ,
λ(1−α)wjρ
)∀j = 1, . . . , p
11: γ(1)k` ←
S(γk−1` − 1
ρu
(1)k−1` , λαw`
ρ
)if w` > 0
γk−1` − 1
ρu
(1)k−1` if w` = 0
12:
(β(3)k
γ(2)k
)←(Ip+|T | − QQT
)[(βk−1
γk−1
)− 1
ρ
(v(3)k−1
u(2)k−1
)]13: βk ← (β(1)k + β(2)k + β(3)k)/3
14: γk ← (γ(1)k + γ(2)k)/2
15: v(i)k ← v(i)k−1 + ρ(β(i)k − βk) ∀i = 1, 2, 3
16: u(j)k ← u(j)k−1 + ρ(γ(j)k − γk) ∀j = 1, 2.
17: if√∑3
i=1 ‖β(i)k − βk‖22 +
∑2j=1
∥∥∥γ(j)k − γk∥∥∥22≤ εabs
√3p + 2|T |+εrel max
√∑3i=1 ‖β
(i)k‖22 +∑2
j=1
∥∥∥γ(j)k∥∥∥22,√
3∥∥βk
∥∥22
+ 2∥∥γk
∥∥22
and ρ
√3‖βk − βk−1‖22 + 2‖γk − γk−1‖22 ≤ ε
abs√
3p + 2|T | + εrel√∑3
i=1 ‖v(i)k‖22 +
∑2j=1 ‖u
(j)k‖22 then
18: continue← false
19: end if
20: end while
Output: βk,γk
28
![Page 29: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/29.jpg)
D.1 Derivation of Algorithm 1
The ADMM updates involve minimizing the augmented Lagrangian of the global consensusproblem (6),
Lρ(β(1),β(2),β(3),γ(1),γ(2),β,γ;v(1),v(2),v(3),u(1),u(2))
=1
2n
∥∥y −Xβ(1)∥∥2
2+ λ
α |T |∑`=1
w`
∣∣∣γ(1)`
∣∣∣+ (1− α)
p∑j=1
wj
∣∣∣β(2)j
∣∣∣+ 1∞β(3) = Aγ(2)
+3∑i=1
(v(i)T (β(i) − β) +
ρ
2‖β(i) − β‖2
2
)+
2∑j=1
(u(j)T (γ(j) − γ) +
ρ
2‖γ(j) − γ‖2
2
).
1. Update β(1).
β(1)k+1 := arg minβ(1)∈Rp
1
2n
∥∥y −Xβ(1)∥∥2
2+⟨v(1)k, (β(1) − βk)
⟩+ρ
2‖β(1) − βk‖2
2
.
Let X = SVD(U ,D,V ) be the singular value decomposition of X, where U ∈ Rn×n
contains left singular vectors in columns, V ∈ Rp×p contains right singular vectorsin columns, and D ∈ Rn×p is a rectangular diagonal matrix with decreasing singularvalues on the diagonal. First order condition to the above problem gives us:
(XTX + nρIp)β(1)k+1 = XTy + nρβk − nv(1)k
⇒ V (DTD + nρIp)VTβ(1)k+1 = XTy + nρβk − nv(1)k
⇒ β(1)k+1 = V diag(([DTD]ii + nρ)−1
)V T
(XTy + nρβk − nv(1)k
).
When n ≥ p, we have
β(1)k+1 = V diag(
([DTD]ii + nρ)−1)V T
(XTy + nρβk − nv(1)k
). (9)
When n < p, the SVD can be expressed in a compact form: D = (D : 0) and
V = (V : V⊥) where D ∈ Rn×n and V ∈ Rp×n are from the compact SVD of X, and
V⊥ ∈ Rp×(p−n). Thus,
V diag(([DTD]ii + nρ)−1
)V T =
(V : V⊥
)diag
(([DTD]ii + nρ)−1
)(V T
V T⊥
)= V diag
(([DTD]ii + nρ)−1
)V T + V⊥V
T⊥ /(nρ)
= V diag(
([DTD]ii + nρ)−1)V T + (Ip − V V T )/(nρ).
So when n < p,
β(1)k+1 =[V diag
(([DTD]ii + nρ)−1
)V T + (Ip − V V T )/(nρ)
] (XTy + nρβk − nv(1)k
).
(10)
Since V = V when n ≥ p and V V T = Ip, we have (10) boil to (9) in that case.
29
![Page 30: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/30.jpg)
2. Update β(2).
β(2)k+1 := arg minβ(2)∈Rp
ρ
2
∥∥∥∥β(2) −(βk − 1
ρv(2)k
)∥∥∥∥2
2
+ λ(1− α)
p∑j=1
wj
∣∣∣β(2)j
∣∣∣ .The solution is simply elementwise soft-thresholding:
β(2)k+1j = S
(βkj −
1
ρv
(2)j ,
λ(1− α)wjρ
)∀j = 1, . . . , p.
3. Update γ(1).
γ(1)k+1 := arg minγ(1)∈R|T |
ρ2∥∥∥∥γ(1) −
(γk − 1
ρu(1)k
)∥∥∥∥2
2
+ λα
|T |∑`=1
w`
∣∣∣γ(1)`
∣∣∣ .
Since sometimes we will choose to w` = 0 for the root, we break the solution into cases:
γ(1)k+1` =
S(γk` − 1
ρu
(1)k` , λαw`
ρ
)if w` > 0
γk` − 1ρu
(1)k` if w` = 0.
4. Joint update of β(3) and γ(2).(β(3)k+1
γ(2)k+1
):= arg min
β(3)∈Rp,γ(2)∈R|T |
∥∥∥∥β(3) −(βk − 1
ρv(3)k
)∥∥∥∥2
2
+
∥∥∥∥γ(2) −(γk − 1
ρu(2)k
)∥∥∥∥2
2
s.t. (Ip : −A)
(β(3)
γ(2)
)= 0.
The solution is the projection of
(βk
γk
)− 1
ρ
(v(3)k
u(2)k
)onto the null space of (Ip : −A).
Let (Ip : −A) = SVD(·, ,Q) where Q = (Q : Q⊥) ∈ R(p+|T |):(p+|T |) contains all the
right singular vectors in columns. So Ip+|T | = QQT = QQT + Q⊥QT⊥. Since Q
corresponds to non-zero singular values of (Ip : −A) by construction, we have Q⊥corresponds to the zero singular values, making itself an orthonormal basis for the nullspace of (Ip : −A). Thus,(
β(3)k+1
γ(2)k+1
)= Q⊥(QT
⊥Q⊥)−1QT⊥
[(βk
γk
)− 1
ρ
(v(3)k
u(2)k
)]= Q⊥Q
T⊥
[(βk
γk
)− 1
ρ
(v(3)k
u(2)k
)]=(Ip+|T | − QQT
)[(βkγk
)− 1
ρ
(v(3)k
u(2)k
)]
30
![Page 31: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/31.jpg)
5. Update global variables β and γ.
βk+1 := arg minβ∈Rp
3∑i=1
∥∥∥∥β − (β(i)k+1 +1
ρv(i)k
)∥∥∥∥2
2
= βk+1 +1
ρvk (11)
γk+1 := arg minγ∈R|T |
2∑j=1
∥∥∥∥γ − (γ(j)k+1 +1
ρu(j)k
)∥∥∥∥2
2
= γk+1 +1
ρuk (12)
where βk := β(1)k+β(2)k+β(3)k
3, vk := v(1)k+v(2)k+v(3)k
3, γk := γ(1)k+γ(2)k
2and uk := u(1)k+u(2)k
2.
6. Update dual variables.
v(1)k+1 := v(i)k + ρ(β(i)k+1 − βk+1) for i = 1, 2, 3,
u(1)k+1 := u(j)k + ρ(γ(j)k+1 − γk+1) for j = 1, 2.
Similarly, averaging the updates for u and the udpates for v gives
vk+1 = vk + ρ(βk+1 − βk+1) (13)
uk+1 = uk + ρ(γk+1 − γk+1) (14)
Substituting (11) and (12) into (13) and (14) yields that vk+1 = uk+1 = 0 after thefirst iteration.
Using βk = βk and γk = γk in the above updates, the updates become Lines 9-16 ofAlgorithm 1. Next, we follow Section 3.3.1 in Boyd et al. (2011) to determine the terminationcriteria. We first write Problem (6) in the same form as Problem (3.1) in Boyd et al. (2011)which is presented below in typewriter font:
minx,z
f(x) + g(z) s.t. Ax + Bz = c
where
A = I3p+2|T |, B = −
Ip 0Ip 0Ip 00 I|T |0 I|T |
, c = 0, x =
β(1)
β(2)
β(3)
γ(1)
γ(2)
and z =
(βγ
).
The primal and dual residuals are
rk+1 = Axk+1+Bzk+1−c =
β(1)k+1 − βk+1
β(2)k+1 − βk+1
β(3)k+1 − βk+1
γ(1)k+1 − γk+1
γ(2)k+1 − γk+1
and sk+1 = ρATB(zk+1−zk) = ρ
βk+1 − βkβk+1 − βkβk+1 − βkγk+1 − γkγk+1 − γk
.
By Condition (3.12) in Boyd et al. (2011), the ADMM algorithm stops when both resid-uals are small. In our case, the termination criteria are the following.
31
![Page 32: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/32.jpg)
1. The primal residual is small:√√√√ 3∑i=1
‖β(i)k − βk‖22 +
2∑j=1
‖γ(j)k − γk‖22
≤√
3p+ 2|T | · εabs + εrel ·max
√√√√ 3∑
i=1
‖β(i)k‖22 +
2∑j=1
‖γ(j)k‖22,
√3 ‖βk‖2
2 + 2 ‖γk‖22
.
2. The dual residual is small:
ρ·√
3‖βk − βk−1‖22 + 2‖γk − γk−1‖2
2 ≤√
3p+ 2|T |·εabs+εrel·
√√√√ 3∑i=1
‖v(i)k‖22 +
2∑j=1
‖u(j)k‖22.
D.2 Treatment of Intercept in Problem (5)
When an intercept β0 is included in the least squares, Problem (5) becomes:
minβ0∈R,β∈Rp,γ∈R|T |
s.t. β=Aγ
1
2n‖y −Xβ − β01n‖2
2 + λ
α |T |∑`=1
w` |γ`|+ (1− α)
p∑j=1
wj |βj|
. (15)
First-order coniditon of the solution (β0, β) yields that
∂ 12n‖y −Xβ − β01n‖2
2
∂β0
∣∣∣∣∣(β0,β)=(β0,β)
=1
n1Tn (1nβ0− (y−Xβ)) =
1
n(nβ0−1Tn (y−Xβ)) = 0.
So β0 = 1n1Tn (y −Xβ). Plugging β0 in Problem (15) and letting H = In − 1
n1n1
Tn yields
minβ∈Rp,γ∈R|T |
s.t. β=Aγ
1
2n‖Hy −HXβ‖2
2 + λ
α |T |∑`=1
w` |γ`|+ (1− α)
p∑j=1
wj |βj|
,
which can now be solved using our consensus ADMM algorithm.
E Proof of Lemma 1
We first show existence of such B∗ by providing a feasible procedure to find B∗. Supposeβ∗ has at least two distinct values (otherwise B∗ = r trivially). Start with B = L(T ) sothat the first constraint is satisfied. If for siblings u, v in B such that the second constraintis violated, by construction β∗j = β∗k for j ∈ L(Tu) and k ∈ L(Tv). So we replace u, v in Bwith their parent node. We repeat the above steps until the second constraint is satisfied,while holding the first constraint. Thus, B satisfies the two requirements for B∗.
Suppose B∗ and B∗ are different aggregating sets for β∗. Without loss of generality,suppose there exists u ∈ B∗ but u /∈ B∗. Then u is a descendant or an ancestor of somenodes in B∗; for either case the second constraint will be violated. Thus, such u does notexist and B∗ = B∗.
The existence and uniqueness of A∗ follow from the definition of support of β∗.
32
![Page 33: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/33.jpg)
F Proof of Theorem 4
Denote our penalty by Ω(β,γ) := α∑|T |
`=1 w` |γ`|+ (1−α)∑p
j=1 wj |βj|. We follow the proof
strategy used in Theorem 1 of Lou et al. (2016) to prove this theorem. If (β, γ) is a solutionto Problem (5), then we have
1
2n
∥∥∥y −Xβ∥∥∥2
2+ λΩ(β, γ) ≤ 1
2n‖y −Xβ‖2
2 + λΩ(β,γ)
for any (β,γ) such that β = Aγ. Let (β∗,γ∗) be such that
β∗ = AB∗β∗ and γ∗` =
β∗` if ` ∈ B∗
0 otherwise.
Plugging in y = Xβ∗ + ε and (β,γ) = (β∗,γ∗), with some algebra we have
1
2n
∥∥∥Xβ −Xβ∗∥∥∥2
2+ λΩ(β, γ) ≤ λΩ(β∗,γ∗) +
1
nεTX∆(β∗) (16)
where ∆(β∗) = β − β∗. By β = Aγ and β∗ = Aγ∗ (and writing ∆(γ∗) = γ − γ∗),
1
nεTX∆(β∗) =
1
nεTXA∆(γ∗).
We next bound n−1εTX∆(β∗) = (1−α)n−1εTX∆(β∗) +αn−1εTXA∆(γ∗) in absolute value.Define Vj := n−1/2XT
j ε for j = 1, . . . , p and U` := n−1/2AT`X
Tε for ` = 1, . . . , |T |. With thechoice of weights wj = ‖Xj‖2 /
√n for 1 ≤ j ≤ p and w` = ‖XA`‖2 /
√n for 1 ≤ ` ≤ |T |,∣∣∣∣ 1nεTX∆(β∗)
∣∣∣∣ =
∣∣∣∣(1− α)1
nεTX∆(β∗) + α
1
nεTXA∆(γ∗)
∣∣∣∣=
∣∣∣∣∣∣(1− α)
p∑j=1
XT
j ε√n‖Xj‖2
· ‖Xj‖2√n
∆(β∗)j
+ α
|T |∑`=1
AT`X
Tε√n‖XA`‖2
· ‖XA`‖2√n
∆(γ∗)`
∣∣∣∣∣∣=
∣∣∣∣∣∣(1− α)
p∑j=1
Vj‖Xj‖2
· wj∆(β∗)j
+ α
|T |∑`=1
U`
‖XA`‖2
· w`∆(γ∗)`
∣∣∣∣∣∣≤ (1− α)
p∑j=1
|Vj|‖Xj‖2
· wj∣∣∣∆(β∗)
j
∣∣∣+ α
|T |∑`=1
|U`|
‖XA`‖2
· w`∣∣∣∆(γ∗)
`
∣∣∣
≤ (1− α) max1≤j≤p
|Vj|‖Xj‖2
·p∑j=1
wj
∣∣∣∆(β∗)j
∣∣∣+ α max1≤`≤|T |
|U`|‖XA`‖2
·|T |∑`=1
w`
∣∣∣∆(γ∗)`
∣∣∣(17)
where the second to last inequality follows from the triangle inequality.
33
![Page 34: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/34.jpg)
Since ε ∼ Nn(0, σ2In), Vj ∼ N (0, ‖Xj‖22σ
2/n) for j = 1, . . . , p and U` ∼ N (0, ‖XA`‖22σ
2/n)for ` = 1, . . . , |T |. By Lemma 6.2 of Buhlmann and van de Geer (2011), we have for x > 0
P
(max1≤j≤p
|Vj|‖Xj‖2
> σ
√2(x+ log p)
n
)≤ 2e−x and
P
(max
1≤`≤|T |
|U`|‖XA`‖2
> σ
√2(x+ log |T |)
n
)≤ 2e−x.
By the union bound,
P
(max1≤j≤p
|Vj|‖Xj‖2
> σ
√2(x+ log p)
nor max
1≤`≤|T |
|U`|‖XA`‖2
> σ
√2(x+ log |T |)
n
)≤ 4e−x.
By the construction of T , each internal node has at least 2 child nodes. To go up to the nextlevel from the leaf nodes, only one node “survives” among its siblings. For T with p leafnodes, there must be at most p− 1 internal nodes where the maximum number is achievedwhen T is a full binary tree. So |T | ≤ 2p. Choosing x = log(2p), we have that the followingresults hold with probability at least 1− 2/p:
max1≤j≤p
|Vj|‖Xj‖2
≤ 2σ
√log(2p)
nand max
1≤`≤|T |
|U`|‖XA`‖2
≤ 2σ
√log(2p)
n.
Plugging these upper bounds into (17), we have the following inequality holding with highprobability:∣∣∣∣ 1nεTX∆(β∗)
∣∣∣∣ ≤ 2σ
√log(2p)
n
(1− α)
p∑j=1
wj
∣∣∣∆(β∗)j
∣∣∣+ α
|T |∑`=1
w`
∣∣∣∆(γ∗)`
∣∣∣
= 2σ
√log(2p)
nΩ(∆(β∗), ∆(γ∗)) (18)
Let λ ≥ 4σ√
log(2p)/n and 0 ≤ α ≤ 1. By (16) and (18), the following holds withprobability at least 1− 2/p:
1
2n
∥∥∥Xβ −Xβ∗∥∥∥2
2≤ 1
2λΩ(∆(β∗), ∆(γ∗))− λΩ(β, γ) + λΩ(β∗,γ∗)
≤ 1
2
(λΩ(β, γ) + λΩ(β∗,γ∗)
)− λΩ(β, γ) + λΩ(β∗,γ∗) (by triangle inequality)
≤ 3
2λΩ(β∗,γ∗) =
3
2λ
(1− α)
p∑j=1
wj|β∗j |+ α
|T |∑`=1
w`|γ∗` |
=
3
2λ
((1− α)
∑j∈A∗
wj|β∗j |+ α∑`∈B∗
w`|β∗` |
)
where A∗ is the support of β∗.
34
![Page 35: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/35.jpg)
G Proof of Corollary 1
With ‖Xj‖2 ≤√n the weights in Theorem 4 become:
• For 1 ≤ j ≤ p, wj =‖Xj‖2√
n≤ 1.
• For 1 ≤ ` ≤ |T |, w` = ‖XA`‖2√n≤ 1√
n
∑j∈L(Tu` )
‖Xj‖2 ≤ |L(Tu`)|.
The inequality in w` is by the triangle inequality and the definition of A: the `th columnA` encodes the descendant leaves of the node u`, i.e., Aj` = 1j∈descendant(u`)∪u`. By
construction of β∗, all coefficients in the branch rooted by an aggregating node u` share thesame value, i.e., β∗j = β∗` for all j ∈ L(Tu`) where ` ∈ B∗. Thus, for ` ∈ B∗,
β∗` =
∥∥∥β∗L(Tu` )
∥∥∥1
|L(Tu`)|⇒
∑`∈B∗
w`
∣∣∣β∗` ∣∣∣ ≤∑`∈B∗|L(Tu`)|
∥∥∥β∗L(Tu` )
∥∥∥1
|L(Tu`)|
= ‖β∗‖1 .
Plugging the choice of λ and the above upper bounds into Theorem (4) yields
1
n
∥∥∥Xβ −Xβ∗∥∥∥2
2≤ 12σ
√log(2p)
n
((1− α)
∑j∈A∗
wj|β∗j |+ α∑`∈B∗
w`|β∗` |
)
≤ 12σ
√log(2p)
n((1− α) ‖β∗‖1 + α ‖β∗‖1)
. σ
√log p
n‖β∗‖1 (by log(2p) = log 2 + log p ≤ 2 log p)
which holds up to a multiplicative constant with probablity at least 1− 2/p.
H Proof of Corollary 2
With β∗1 = . . . = β∗p the aggregating set consists of the root, i.e., B∗ = r. Thus, β∗r =β∗1 = ‖β∗‖1/p and wr = ‖X1p‖2/
√n. With ‖Xj‖2 ≤
√n, we have wj = ‖Xj‖2/
√n ≤ 1 for
1 ≤ j ≤ p.Plugging λ = 4σ
√log(2p)/n into Theorem (4) yields
1
n
∥∥∥Xβ −Xβ∗∥∥∥2
2≤ 12σ
√log(2p)
n
((1− α)
∑j∈A∗
wj|β∗j |+ αwr|β∗r |
)
≤ 12σ
√log(2p)
n
((1− α) ‖β∗‖1 + α
‖X1p‖1
p√n‖β∗‖1
). σ
√log p
n‖β∗‖1
((1− α) + α
‖X1p‖2
p√n
)
35
![Page 36: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/36.jpg)
which holds up to a multiplicative constant with probablity at least 1 − 2/p. Now, if α ≥[1 + ‖X1p‖2/(p
√n)]−1
, then
1− α ≤ α‖X1p‖2
p√n
and so1
n
∥∥∥Xβ −Xβ∗∥∥∥2
2. σ
√log p
n‖β∗‖1 α
‖X1p‖2
p√n
It follows that if ‖X1p‖2 = o(p√n), then our method achieves a better rate than the lasso
rate of σ√
log p/n ‖β∗‖1.
I Proof of Proposition 1
Let X(1), . . . , X(p) ∈ 0, 1n be iid copies of the random vector X ∈ 0, 1n with 1TnX = rand P(Xi = 1) = r/n ∀i = 1, . . . , n. By construction X(1), . . . , X(p) each has r nonzeroentries. For the jth column of X, let Xj =
√n/rX(j) so that its nonzero entries equal√
n/r and ‖Xj‖2 =√n.
For the random vector X, its expectation and covariance (Σ) diagonal entries are
EX =r
n1n and Σii = V ar(Xi) =
r
n
(1− r
n
)=: τ 2.
Suppose ‖X − EX‖2 ≤ L for some L > 0. Let Z :=∑p
j=1 X(j) and v(Z) := max‖E[(Z −
EZ)(Z−EZ)T ]‖op,E[‖Z−EZ‖22]. Corollary 6.1.2 of Tropp (2015) gives the concentration
inequality for Z: ∀t ≥ 0,
P (‖Z − EZ‖2 ≥ t) ≤ (n+ 1) · exp
(−t2/2
v(Z) + Lt/3
). (19)
First, we derive a closed form for L. By construction of X (in particular 1TnX = r),∥∥∥X − EX∥∥∥2
2=∥∥∥X − r
n1n
∥∥∥2
2= (n− r)(0− r
n)2 + r(1− r
n)2 = r(1− r
n) = nτ 2.
So choose L = ‖X − EX‖2 =√nτ . Next, we simplify the two components of v(Z):
‖E[(Z − EZ)(Z − EZ)T ]‖op = ‖Cov(Z)‖op = p‖Σ‖op,
E[‖Z − EZ‖22] =
n∑i=1
V ar(Zi) = tr(Cov(Z)) = p · tr(Σ).
Furthermore, max‖Σ‖op, tr(Σ) = tr(Σ) since Σ is positive semidefinite. Combining theabove yields
v(Z) = p · tr(Σ) = p · n · τ 2 = pL2.
Substituting v(Z) and EZ = pr/n · 1n into (19) yields
P(∥∥∥Z − pr
n1n
∥∥∥2≥ t)≤ (n+ 1) · exp
(−t2/2
pL2 + Lt/3
)∀t ≥ 0.
36
![Page 37: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/37.jpg)
Since Xj =√n/rX(j), we have X1p =
√n/rZ and ∀t ≥ 0,
P(∥∥∥∥X1p − p
√r
n1n
∥∥∥∥2
≥√n
rt
)= P
(∥∥∥Z − pr
n1n
∥∥∥2≥ t)≤ (n+1)·exp
(−(t/L)2/2
p+ (t/L)/3
)=: F (t).
Choosing t = pr/√n and by the triangle inequality,
‖X1p‖2 ≤ ‖X1p − p√r/n · 1n‖2 + ‖p
√r/n · 1n‖2 ≤
√n/r · t+ p
√r = 2p
√r
holds with probability at least 1 − F (pr/√n). Also t/L = pr/(nτ). We next simplify
F (pr/√n).
F
(pr√n
)= (n+ 1) · exp
(−1
2
(prnτ
)2
p+ 13
(prnτ
)) = (n+ 1) · exp
(− pr2
2n2τ2
1 + r3nτ
)
⇒ logF
(pr√n
)= log(n+ 1)− 1
2· p(r/n)2
τ 2 + 13· τ · r
n
≤ log(n+ 1)− 1
2· p(r/n)2
r/n+ 13(r/n)3/2
(by τ =
√r
n
(1− r
n
)≤√r
n
)≤ log(n+ 1)− 1
2· p(r/n)
1 + 13
(by
r
n≤ 1)
= log(n+ 1)− 3
8· prn.
So, P(‖X1p‖2 ≤ 2p√r) ≥ 1 − exp
(log(n+ 1)− 3pr
8n
). Note that r
n> 8 log(n+1)
3pensures the
exponent converges to negative infinity as n and p increase. Thus, if r > 8n log(n+1)3p
, then
‖X1p‖2 = Op(p√r) ⇔ ‖X1p‖2
p√n
= Op
(√r
n
).
So, we have ‖X1p‖2 = op(p√n) if r = o(n) and r > 8n log(n+1)
3p.
J Paired t-tests between our method and L1-ag-h
To compare the test errors in the first two columns of Table 1 in the main paper, we performa two-sample t-test for each row. For each of the n = 40, 000 reviews, a pair of squaredprediction errors is calculated, one from each method. Conditional on the training set, thesen pairs can be thought of as independent. Table 4 shows the resulting p-values.
prop. 1% 5% 10% 20% 40% 60% 80% 100%p-value 3.95× 10−9 0.086 7.21× 10−7 7.56× 10−8 2.2× 10−16 0.056 0.105 0.001
Table 4: Paired t-tests between squared prediction errors from our method and L1-ag-h onthe held-out test set. The prop. values correspond to different training sizes in Table 1.
37
![Page 38: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/38.jpg)
K Hierarchical Clustering Tree with ELMo
Embeddings from Language Models (ELMo) is a state-of-the-art natural language processingframework for representing words with deep contextualized embeddings (Peters et al., 2018).ELMo leverages a deep neural network that is pre-trained on a large text corpus and hasmultiple attractive properties. For our purpose, the most relevant property of ELMo is thatits word embeddings account for the context in which the words appear. In particular, theELMo embedding assigned to a word is a function of the entire sentence containing thatword. This means that the same word in different contexts will have different embeddings.This is a great advantage over traditional word embeddings such as GloVe (Pennington et al.,2014).
Among the training sets listed in Table 1, we focus on one scenario when n = 33, 997 andp = 5, 621 (which corresponds to 20% of the entire training reviews). We use TensorFlow(Abadi et al., 2015) and an ELMo model pre-trained on the One Billion World LanguageModel Benchmark5 to get 1,024-dimensional embeddings for all words in the 33,997 trainingreviews and the 40,000 held-out testing reviews. To establish a fair comparison with existingexperimentation, we keep only adjective embeddings in the reviews, which leaves us 952,974adjective embeddings for training and 1,122,030 adjective embeddings for testing. Again,the same adjective in various sentences has different ELMo embeddings whose proximitiesencode how the word is used in each context.
We use mini-batch K-Means clustering (Sculley, 2010) to cluster the 952,974 adjectivesfrom training into 6,001 clusters. The choice of the number of K-Means clusters ensuresthat the comparison between ELMo and GloVe (which is based on p=5,621) is roughly onthe same scale of features. After mini-batch K-Means clustering, we update each clustercentroid to the averaged embeddings for training adjectives assigned to the cluster. We alsoassociate testing adjectives to the clusters by Euclidean distance. After associating adjectivesto clusters, we construct a document-cluster matrix for training and testing, respectively.The ij-entry of the document-cluster matrix has the number occurrences of the jth clusterin the ith review. Each cluster represents a collection of occurrences of adjectives usedin a similar context within a review. We next generate a tree by performing hierarchicalclustering on the cluster centroids. The tree and document-cluster matrix is then used infitting our model along with others such as L1-ag-h and L1-ag-d.
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S.,Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard,M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga,R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I.,Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden,P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. (2015). TensorFlow: Large-scalemachine learning on heterogeneous systems. Software available from tensorflow.org.
5http://www.statmt.org/lm-benchmark/
38
![Page 39: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/39.jpg)
Arnold, T. B. and Tibshirani, R. J. (2014). genlasso: Path algorithm for generalized lassoproblems. R package version 1.3.
Bernstein, D. S. (2009). Matrix Mathematics: Theory, Facts, and Formulas (Second Edition).Princeton University Press.
Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011). Distributed optimizationand statistical learning via the alternating direction method of multipliers. Found. TrendsMach. Learn., 3(1):1–122.
Buhlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods,Theory and Applications. Springer Publishing Company, Incorporated, 1st edition.
Cao, Y., Zhang, A., and Li, H. (2017). Microbial Composition Estimation from Sparse CountData. ArXiv e-prints.
Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello,E. K., Fierer, N., Pena, A. G., Goodrich, J. K., Gordon, J. I., Huttley, G. A., Kelley, S. T.,Knights, D., Koenig, J. E., Ley, R. E., Lozupone, C. A., McDonald, D., Muegge, B. D.,Pirrung, M., Reeder, J., Sevinsky, J. R., Turnbaugh, P. J., Walters, W. A., Widmann,J., Yatsunenko, T., Zaneveld, J., and Knight, R. (2010). QIIME allows analysis of high-throughput community sequencing data. Nat. Methods, 7(5):335–336.
Chen, J., Bushman, F. D., Lewis, J. D., Wu, G. D., and Li, H. (2013). Structure-constrainedsparse canonical correlation analysis with an application to microbiome data analysis.Biostatistics, 14(2):244–258.
Cote, F. D., Psaromiligkos, I. N., and Gross, W. J. (2012). A Chernoff-type Lower Boundfor the Gaussian Q-function. ArXiv e-prints.
Feinerer, I. and Hornik, K. (2016). wordnet: WordNet Interface. R package version 0.1-11.
Feinerer, I. and Hornik, K. (2017). tm: Text Mining Package. R package version 0.7-1.
Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Bradford Books.
Forman, G. (2003). An extensive empirical study of feature selection metrics for text classi-fication. J. Mach. Learn. Res., 3:1289–1305.
Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalizedlinear models via coordinate descent. Journal of Statistical Software, 33(1):1–22.
Guinot, F., Szafranski, M., Ambroise, C., and Samson, F. (2017). Learning the optimal scalefor GWAS through hierarchical SNP aggregation. ArXiv e-prints.
Huang, A. (2008). Similarity measures for text document clustering. In Proceedings ofthe sixth new zealand computer science research student conference (NZCSRSC2008),Christchurch, New Zealand, pages 49–56.
39
![Page 40: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/40.jpg)
Ke, T., Fan, J., and Wu, Y. (2015). Homogeneity Pursuit. J Am Stat Assoc, 110(509):175–194.
Khabbazian, M., Kriebel, R., Rohe, K., and An, C. (2016). Fast and accurate detectionof evolutionary shifts in ornsteinuhlenbeck models. Methods in Ecology and Evolution,7(7):811–824.
Kim, S., Xing, E. P., et al. (2012). Tree-guided group lasso for multi-response regression withstructured sparsity, with an application to eqtl mapping. The Annals of Applied Statistics,6(3):1095–1117.
Li, C. and Li, H. (2010). Variable selection and regression analysis for graph-structuredcovariates with an application to genomics. Ann. Appl. Stat., 4(3):1498–1516.
Li, Y., Raskutti, G., and Willett, R. (2018). Graph-based regularization for regressionproblems with highly-correlated designs. ArXiv e-prints.
Lin, W., Shi, P., Feng, R., and Li, H. (2014). Variable selection in regression with composi-tional covariates. Biometrika, 101:785–797.
Liu, X., Yu, S., Janssens, F., Glanzel, W., Moreau, Y., and De Moor, B. (2010). Weightedhybrid clustering by combining text mining and bibliometrics on a large-scale journaldatabase. J. Am. Soc. Inf. Sci. Technol., 61(6):1105–1119.
Lou, Y., Bien, J., Caruana, R., and Gehrke, J. (2016). Sparse partially linear additivemodels. Journal of Computational and Graphical Statistics, 25(4):1126–1140.
Matsen, F. A., Kodner, R. B., and Armbrust, E. V. (2010). pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.BMC Bioinformatics, 11:538.
McMurdie, P. J. and Holmes, S. (2013). phyloseq: An r package for reproducible interactiveanalysis and graphics of microbiome census data. PLOS ONE, 8(4):1–11.
Mohammad, S. M. and Turney, P. D. (2013). Crowdsourcing a word–emotion associationlexicon. Computational Intelligence, 29(3):436–465.
Mukherjee, R., Pillai, N. S., and Lin, X. (2015). Hypothesis testing for high-dimensionalsparse binary regression. Ann. Statist., 43(1):352–381.
Pennington, J., Socher, R., and Manning, C. D. (2014). Glove: Global vectors for wordrepresentation. In Empirical Methods in Natural Language Processing (EMNLP), pages1532–1543.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer,L. (2018). Deep contextualized word representations. In Proc. of NAACL.
R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foun-dation for Statistical Computing, Vienna, Austria.
40
![Page 41: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/41.jpg)
Randolph, T. W., Zhao, S., Copeland, W., Hullar, M., and Shojaie, A. (2015). Kernel-Penalized Regression for Analysis of Microbiome Data. ArXiv e-prints.
Ridenhour, B. J., Brooker, S. L., Williams, J. E., Van Leuven, J. T., Miller, A. W., Dearing,M. D., and Remien, C. H. (2017). Modeling time-series data from microbial communities.ISME J, 11(11):2526–2537.
Schloss, P., L Westcott, S., Ryabin, T., R Hall, J., Hartmann, M., Hollister, E., Lesniewski,R., Oakley, B., Parks, D., Robinson, C., W Sahl, J., Stres, B., G Thallinger, G., Van Horn,D., and Weber, C. (2009). Introducing mothur: Open-source, platform-independent,community-supported software for describing and comparing microbial communities. Ap-plied and environmental microbiology, 75(23):7537–41.
Sculley, D. (2010). Web-scale k-means clustering. In Proceedings of the 19th InternationalConference on World Wide Web, WWW 10, page 11771178, New York, NY, USA. Asso-ciation for Computing Machinery.
She, Y. (2010). Sparse regression with exact clustering. Electron. J. Statist., 4:1055–1096.
Shi, P., Zhang, A., and Li, H. (2016). Regression analysis for microbiome compositionaldata. Ann. Appl. Stat., 10(2):1019–1040.
Tang, Y., Li, M., and Niclolae, D. L. (2016). Phylogenetic Dirichlet-multinomial model formicrobiome data. ArXiv e-prints.
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., and Kappas, A. (2010). Sentiment inshort strength detection informal text. J. Am. Soc. Inf. Sci. Technol., 61(12):2544–2558.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the RoyalStatistical Society, Series B, 58:267–288.
Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso. Ann.Statist., 39(3):1335–1371.
Tropp, J. A. (2015). An introduction to matrix concentration inequalities. Foundations andTrends in Machine Learning, 8(1-2):1–230.
Wallace, M. (2007). Jawbone Java WordNet API.
Wang, H., Lu, Y., and Zhai, C. (2010). Latent aspect rating analysis on review text data:A rating regression approach. In Proceedings of the 16th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, KDD ’10, pages 783–792, NewYork, NY, USA. ACM.
Wang, J., Shen, X., Sun, Y., and Qu, A. (2016). Classification with unstructured predictorsand an application to sentiment analysis. Journal of the American Statistical Association,111(515):1242–1253.
Wang, T. and Zhao, H. (2017a). A Dirichlet-tree multinomial regression model for associatingdietary nutrients with gut microorganisms. Biometrics, 73(3):792–801.
41
![Page 42: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy](https://reader035.vdocuments.net/reader035/viewer/2022062603/5f1e57c9008b556fcd0df43a/html5/thumbnails/42.jpg)
Wang, T. and Zhao, H. (2017b). Structured subcomposition selection in regression and itsapplication to microbiome data analysis. Ann. Appl. Stat., 11(2):771–791.
Xia, F., Chen, J., Fung, W. K., and Li, H. (2013). A logistic normal multinomial regressionmodel for microbiome compositional data analysis. Biometrics, 69(4):1053–1063.
Yu, G. and Liu, Y. (2016). Sparse regression incorporating graphical structure among pre-dictors. Journal of the American Statistical Association, 111(514):707–720.
Zhai, J., Kim, J., Knox, K. S., Twigg, H. L., Zhou, H., and Zhou, J. J. (2018). Vari-ance component selection with applications to microbiome taxonomic data. Frontiers inMicrobiology, 9:509.
Zhang, T., Shao, M.-F., and Ye, L. (2012). 454 pyrosequencing reveals bacterial diversity ofactivated sludge from 14 sewage treatment plants. The ISME Journal, 6(6):1137–1147.
42