rare feature selection in high dimensions › pdf › 1803.06675.pdf · ture selection when...

42
Rare Feature Selection in High Dimensions Xiaohan Yan * Jacob Bien July 10, 2020 Abstract It is common in modern prediction problems for many predictor variables to be counts of rarely occurring events. This leads to design matrices in which many columns are highly sparse. The challenge posed by such “rare features” has received little atten- tion despite its prevalence in diverse areas, ranging from natural language processing (e.g., rare words) to biology (e.g., rare species). We show, both theoretically and em- pirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response. Our strategy leverages side information in the form of a tree that encodes feature similarity. We apply our method to data from TripAdvisor, in which we predict the numerical rating of a hotel based on the text of the associated review. Our method achieves high accuracy by making effective use of rare words; by contrast, the lasso is unable to identify highly predictive words if they are too rare. A companion R package, called rare, implements our new estimator, using the alternating direction method of multipliers. 1 Introduction The assumption of parameter sparsity plays an important simplifying role in high-dimensional statistics. However, this paper is focused on sparsity in the data itself, which actually makes estimation more challenging. In many modern prediction problems, the design matrix has many columns that are highly sparse. This arises when the features record the frequency of events (or the number of times certain properties hold). While a small number of these events may be common, there is typically a very large number of rare events, which correspond to features that are zero for nearly all observations. We call these predictors rare features. Rare features are in fact extremely common in many modern data sets. For example, consider the task of predicting user behavior based on past website visits: Only a small number of sites are visited by a lot of the users; all other sites are visited by only a small proportion of users. * Data Scientist, Microsoft; email: [email protected] Associate Professor, Data Sciences and Operations, Marshall School of Business, University of Southern California; email: [email protected] 1 arXiv:1803.06675v2 [stat.ME] 8 Jul 2020

Upload: others

Post on 04-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Rare Feature Selection in High Dimensions

Xiaohan Yan∗ Jacob Bien†

July 10, 2020

Abstract

It is common in modern prediction problems for many predictor variables to becounts of rarely occurring events. This leads to design matrices in which many columnsare highly sparse. The challenge posed by such “rare features” has received little atten-tion despite its prevalence in diverse areas, ranging from natural language processing(e.g., rare words) to biology (e.g., rare species). We show, both theoretically and em-pirically, that not explicitly accounting for the rareness of features can greatly reducethe effectiveness of an analysis. We next propose a framework for aggregating rarefeatures into denser features in a flexible manner that creates better predictors of theresponse. Our strategy leverages side information in the form of a tree that encodesfeature similarity.

We apply our method to data from TripAdvisor, in which we predict the numericalrating of a hotel based on the text of the associated review. Our method achieveshigh accuracy by making effective use of rare words; by contrast, the lasso is unableto identify highly predictive words if they are too rare. A companion R package,called rare, implements our new estimator, using the alternating direction method ofmultipliers.

1 Introduction

The assumption of parameter sparsity plays an important simplifying role in high-dimensionalstatistics. However, this paper is focused on sparsity in the data itself, which actually makesestimation more challenging. In many modern prediction problems, the design matrix hasmany columns that are highly sparse. This arises when the features record the frequency ofevents (or the number of times certain properties hold). While a small number of these eventsmay be common, there is typically a very large number of rare events, which correspond tofeatures that are zero for nearly all observations. We call these predictors rare features. Rarefeatures are in fact extremely common in many modern data sets. For example, consider thetask of predicting user behavior based on past website visits: Only a small number of sitesare visited by a lot of the users; all other sites are visited by only a small proportion of users.

∗Data Scientist, Microsoft; email: [email protected]†Associate Professor, Data Sciences and Operations, Marshall School of Business, University of Southern

California; email: [email protected]

1

arX

iv:1

803.

0667

5v2

[st

at.M

E]

8 J

ul 2

020

Page 2: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

As another example, consider text mining, in which one makes predictions about documentsbased on the terms used. A typical approach is to create a document-term matrix in whicheach column encodes a term’s frequency across documents. In such domains, it is often thecase that the majority of the terms appear very infrequently across the documents; hencethe corresponding columns in the document-term matrix are very sparse (e.g., Forman 2003;Huang 2008; Liu et al. 2010; Wang et al. 2010). In Section 6, we study a text dataset withmore than 200 thousand reviews crawled from https://www.tripadvisor.com. Our goal isto use the adjectives in a review to predict a user’s numerical rating of a hotel. As shownin the right panel of Figure 7, the distribution of adjective density, defined as the propor-tion of documents containing an adjective, is extremely right-skewed, with many adjectivesoccurring very infrequently in the corpus. In fact, we find that more than 95% of the 7,787adjectives appear in less than 5% of the reviews. It is common practice to simply discardrare terms,1 which may mean removing most of the terms (e.g., Forman 2003; Huang 2008;Liu et al. 2010; Wang et al. 2010).

Rare features also arise in various scientific fields. For example, microbiome data measurethe abundances of a large number of microbial species in a given environment. Researchersuse next generation sequencing technologies, clustering these reads into “operational tax-onomic units” (OTUs), which are roughly thought of as different species of microbe (e.g.,Schloss et al. 2009; Caporaso et al. 2010). In practice, many OTUs are rare, and researchersoften aggregate the OTUs to genus or higher levels (e.g., Zhang et al. 2012; Chen et al.2013; Xia et al. 2013; Lin et al. 2014; Randolph et al. 2015; Shi et al. 2016; Cao et al. 2017)or with unsupervised clustering techniques (e.g. McMurdie and Holmes 2013; Wang andZhao 2017b) to create denser features. However, even after this step, a large portion ofthese aggregated OTUs are still found to be too sparse and thus are discarded (e.g., Zhanget al. 2012; Chen et al. 2013; Shi et al. 2016; Wang and Zhao 2017b). The rationale for thiselimination of rare OTUs is that there needs to be enough variation among samples for anOTU to be successfully estimated in a statistical model (Ridenhour et al., 2017).

The practice of discarding rare features is wasteful: a rare feature should not be inter-preted as an unimportant one since it can be highly predictive of the response. For instance,using the word “ghastly” in a hotel review delivers an obvious negative sentiment, but thisadjective appears very infrequently in TripAdvisor reviews. Discarding an informative wordlike “ghastly” simply because it is rare clearly seems inadvisable. To throw out over half ofone’s features is to ignore what may be a huge amount of useful information.

Even if rare features are not explicitly discarded, many existing variable selection methodsare unable to select them. The challenge is that with limited examples there is very littleinformation to identify a rare feature as important. Theorem 1 shows that even a single rarefeature can render ordinary least squares (OLS) inconsistent in the classical limit of infinitesample size and fixed dimension.

To address the challenge posed by rare features, we propose in this work a method forforming new aggregated features which are less sparse than the original ones and may bemore relevant to the prediction task. Consider the following features, which represent thefrequency of certain adjectives used in hotel reviews:

1For example, in the R text mining library tm (Feinerer and Hornik, 2017), removeSparseTerms is acommonly used function for removing any terms with sparsity level above a certain threshold.

2

Page 3: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

• Xworrying,Xdepressing, . . . ,Xtroubling,

• Xhorrid,Xhideous, . . . ,Xawful.

While both sets of adjectives express negative sentiments, the first set (which might besummarized as “worry”) seems more mild than the second set (which might be summarizedas “horrification”). In predicting the rating of a hotel review, we might find the followingtwo aggregated features more relevant:

Xworry = Xworrying +Xdepressing + · · ·+Xtroubling

Xhorrification = Xhorrid +Xhideous + · · ·+Xawful.

The distinction between “horrid” and “hideous” might not matter for predicting the hotelrating, whereas the distinction between a “worry”-related word versus a “horrification”-related word may be quite relevant. Thus, not only are these aggregated features less rarethan the original features, but they may also be more relevant to the prediction task. Amethod that selects the aggregated feature Xhorrification thereby can incorporate the infor-mation conveyed in the use of “hideous” into the prediction task; this same method may beunable to otherwise determine the effect of “hideous” by itself since it is too rare.

Indeed, appropriate aggregation of rare features in certain situations can be key to at-taining consistent estimation and support recovery. In Theorem 2, we consider a settingwhere all features are rare and a natural aggregation rule exists among the features. In thatsetting, we show that the lasso (Tibshirani, 1996) fails to attain high-probability supportrecovery (for all values of its tuning parameter), whereas an oracle-aggregator does attainthis property. Theorem 2 demonstrates the value of proper aggregation for accurate fea-ture selection when features are rare. This motivates the remainder of the paper, in whichwe devise a strategy for determining an effective feature aggregation based on data. Ouraggregation procedure makes use of side information about the features, which we find isavailable in many domains. In particular, we assume that a tree is available that representsthe closeness of features. For example, Figure 1 shows a tree for the previous word examplethat is generated via hierarchical clustering over GloVe (Pennington et al., 2014) embeddingslearned from a different data source. The two contours enclose two subtrees resulting froma cut at their joint node. Aggregating the counts in these subtrees leads to the new featuresXworry and Xhorrification described above. We give more details of constructing such a tree inSection 3.1.

In Section 2, we motivate our work by providing theoretical results demonstrating thedifficulty that OLS and the lasso have with rare features. We further show that correctaggregation of rare features leads to signed support recovery in a setting where the lassois unable to attain this property. In Section 3, we introduce a tree-based parametrizationstrategy that translates the feature aggregation problem to a sparse modeling problem. Ourmain proposal is an estimator formulated as a solution to a convex optimization problemfor which we derive an efficient algorithm. We draw connections between this method andrelated approaches and then in Section 4, provide a bound on the prediction error for ourmethod. Finally, we demonstrate the empirical merits of the proposed framework throughsimulation (Section 5) and through the TripAdvisor prediction task (Section 6) describedabove. In simulation, we examine our method’s robustness to misspecified side informa-

3

Page 4: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

expectablepickledcornedstocked

stuffedpackaged

rumalaskansalmon

lengthwisecoarse

uncookedavocado

shreddedchopped

peeledrotten

fatbottom

topstoppedtopping

frozendried

steamingwhipping

leftoverstale

boiledsour

hungrystarving

exhausteddehydrated

caringtending

distressedstricken

blinddeaf

disabledhomeless

elderlyaged

eighteenfifty

ninetycontracted

treatedill

pregnantsick

dyingconjoined

dominicanreturningdeparted

strandedrescued

modultraeponymous

hippunk

secrethidden

dirtyfake

animalpet

wildmad

coloredorange

redhighlandcrescent

midwesternrural

unpopulatedundeveloped

untouchedinaccessible

woodedhilly

sprawlingpopulated

scrubshallow

rockysandy

nauticalloco

sportivetropical

torrentialcloudy

dampwet

humidrainy

windychilly

weathercold

warmcool

hourlyannualyearly

grosstotal

minusprojected

surpassingwhopping

averageadjusted

ilxx

prepaidmailedpostal

misrepresentedinflatedskewed

unfoundedmisleading

falsedubious

questionableincomplete

inconsistentincorrect

sketchyobscure

aforementionedpeculiar

seemingconfusing

ambiguousproblematic

unsolicitedunauthorised

forfeitoutright

heardloud

yelledshouted

screamingcrying

blaringbanging

atticupstairs

downstairsawakeasleep

wakingawakened

sleepingbreathing

otcci

weewantan

bumcapitalist

orientatedlumbering

imperialisticethnic

foreigndomestic

thaiindonesian

koreanvietnamese

chinesemasculine

gayabusive

abusedharassed

communistpolitical

liberalpenal

mandatorystrict

forbiddenprohibited

violatedrestricted

limitingslight

downwarddim

fadeddashed

disappointingdismal

lacklustersubdued

upbeattepid

stockdipped

rosefelldropping

fallingrisenfallen

loweredslashed

sterlingrecoveringrecovered

lostbeat

beatenranked

fourthconsecutive

missedshot

tiedlone

minutefoul

franksometime

midlateago

fireddismissed

sentordered

sufficientlacking

minimalexcessive

excesswaste

rawscarce

worstblame

blamedstruggling

experiencedfaced

hurtbad

worsepoor

weakaffected

impactednegativefavorable

dangerousisolated

unstableelevated

relativelow

reducedbush

federalinternal

dueprevious

followingsubsequent

hearingincident

superiorpotent

formidableprobable

potentialapparent

seriousremote

neighboringcamp

nearnearby

scrambledrushed

rushwaiting

stoppedcollect

triedtrying

decidedwanted

attemptedforcedunable

remainingleft

removedabandoned

spreadfound

coveredexposed

offensivedefensive

uniformedloyal

armedfighting

proanti

opposedbacked

explosiveunidentified

searchingspotted

automaticwarning

flashkilling

stabbingexecuted

infamousnotorious

chargedguilty

suspectcriminalalleged

pettydrunken

drunkoperative

cutaneousfacialbone

minorsustained

sufferingsevere

bodilymental

fossilwarming

magnetichydraulic

ambientfluid

liquidsyntheticadditive

contentanalog

runnyitchy

scratchyedible

inediblefragrant

pottedrust

rustypowdery

mushygreasy

flakysticky

wateryhairy

hollowfuzzyeyed

buffreddish

bluishdarkening

opaquemurky

okayalright

deardamn

remissnappingseasick

pickybrag

bragginginquiring

discerningtripping

lingualbioluminescent

equidistantinterlacedschematic

derisiveaudible

buzzingwhispering

upwindbuggyhuffy

toedhooked

wiredplugged

fabgabby

littlerbedded

looseleaftropic

piedtrojan

iberiansmoking

secondhandaddictivealcoholic

gastrointestinalallergic

asthmaticsensitive

touchynegotiable

obligatoryoverdue

undecidedshy

inclinedworried

concernedconfused

unhappyfrustrated

hesitantreluctantunwilling

eageranxious

skepticalwary

advertisedbilled

bookedunavailable

resteddisposed

compensatedgraded

categorizedchecked

sortedfarthest

distantapproaching

wanderingsneaking

touchingbouncing

crossedhalfway

stucksideways

draggingsliding

drivenrollingrolledfloatingdrifting

tippedstray

discriminatorydiscriminating

intrusiveexplicit

unsafeimpractical

unusablepoorly

ineffectiveinadequate

consumingcostly

repetitiveneedless

unnecessarydiscouraged

apprehensiveleery

mumadamantdisturbed

horrifieddisgusted

appalleddismayed

hystericalterrified

distraughtstung

embarrassedannoyedirritated

botheredcomplaining

thunderstruckgobsmacked

flabbergastedspeechless

apologeticincredulous

bewilderedbaffled

perplexedpuzzled

unimpresseddisheartened

perturbedoffendeddisliked

dazzledenamored

pesteredpatronized

lazykindlywise

jollysly

insaneincompetent

unresponsiveselfish

arrogantnaive

ignorantuncaring

aloofforgiving

squeamishfussy

indulgentobnoxious

overbearingcocky

pushyuptight

surlysullen

cluelessgrumpy

lamerude

downrightnastyugly

stupidsillyweirdcrazy

inexperienceddisorganizeddisorganised

unimaginativeunmotivated

indifferentuninterested

noisyrowdy

unrulydisgruntled

iratemeagresizeable

exorbitantextortionate

overpricedtemptingenticing

limitlesslofty

manageableunmanageable

bearableintolerable

unbearableaggravating

compoundedstressful

pleasurabledraining

taxingspoilt

spoileddreary

miserablelousy

mediocrefeeble

patheticpitiful

hurriedfumbling

clumsyhaphazard

perfunctoryhalfhearted

wimpycrappy

touristygrouchy

crotchetysnootysnotty

smallishgrungyscruffy

constipatedthieving

grubbydodgy

tattytipsy

skankyslothful

bothersomenauseating

yuckylumpyuntidy

insubstantialunappealingunattractive

raunchytrashy

tastelessshameless

ostentatiousobtrusiveimpersonal

unsupportiveunromantic

uninvitingrepulsiverevoltingstereotyped

soullessglorified

decadentunsavory

harmlessinsignificant

imaginableserviceable

characterlessflavorless

existentunmentionable

borderlineatrocious

deplorableappalling

disgustingshameful

disgracefulinexcusable

repugnantundesirable

disadvantageousunhelpful

unfriendlyunprofessional

dishonestinconsiderate

impolitedisrespectful

mistakenignored

contraryjustified

beforehandrealised

understandableunavoidableunfortunate

solvedcorrected

rectifiedflush

crackcracked

interferingobligedobliging

revertingbowing

systematiccleansing

irregulartransient

endlesscountless

occasionalsporadic

evilinvisible

mysterioussheer

utterunbelievable

nonsenseoutrageous

ludicrousabsurd

ridiculousfrightening

scarybizarre

oddstrange

eeriechilling

ominousfuturistic

unrealworrying

depressingalarming

disturbingtroubling

horridhideousghastly

dreadfulhorrendous

horrifichorrifying

terriblehorrible

awfulinconvenient

uselesspointless

boringtedious

dullbland

unpleasantuncomfortable

awkwardunnerving

disconcertingannoyingirritating

abruptunexpected

suddenimmediate

swiftstark

bluntsharp

extremeharsh

draconiandrastic

humiliatingembarrassing

shockinghostile

unacceptableinappropriate

disappointedsurprised

shockedstunned

angeredangry

furiousanswering

repeatedechoing

nervoustired

boredweary

impatientsad

sorryashamed

upsettingupset

trickychallengingdifficult

toughfigured

wonderingnoticedseeing

afraidscared

wantingpretend

desperatevain

unluckytiring

frustratingpacked

crowdedjammed

filledempty

surroundedlined

foldingwooden

drapedbearing

picturedframed

stackedstrung

barefootlegged

barenaked

staringseatedsitting

dressedwearing

worndress

beigeskimpy

wrinkledsoiled

rearoverhead

upperouter

blankflip

diagonalsquared

narrowinggaping

umbrellamat

tieredroofed

mustymoldy

smellingputrid

fishyhokey

pretentioustacky

cheesyspooky

sneakygroovy

smokybouncy

mellowmelodious

boggywaterlogged

soggyfaultless

passablemoonlitdrizzly

comfycushy

stuffycrampedcluttered

swankyseedy

dankclaustrophobicfilthy

smellythreadbare

shabbydingygrimy

spartandrab

ramshackledecrepit

uphillgruellingtortuous

sleeplesshecticfrantic

overpoweringsuffocating

unrelentingblazing

leisurelybrisk

uneventfuleventful

unimpressiveiffy

unsatisfyingunannounced

attendantnonstop

understaffedlax

outdatedcumbersome

inefficientinoperable

operableunattended

unlockedventilated

moveablemovable

motorizedstationary

uncleansterile

sterilizedwatered

scrubbedpatched

hydratedtonic

soundproofhydroponic

crustedproofed

treeddeadened

sloppedthudding

vertiginousextraordinaire

manqueslippy

piscinebarefooted

offishlaxative

swishredux

spankingmaniac

repellentrepellant

especialcopiousfrayed

souredmarred

chaotictense

heatedsplit

dominateddestroyeddamaged

ruinedwrecked

shutshuttered

interruptedendedbroke

swepthammered

strucksuspended

cancelleddelayed

threatenedcontinuedrenewed

closeclosed

blockedsealed

controlledregulated

reservedvested

temporaryvacant

communalunoccupied

occupiedraining

splashedsoakedwashed

litteredscattered

burningburnt

burnedapart

brokenblown

thrownparked

smashedsmashing

preservedneglected

overlookedforgotten

sunkenmangled

gravedead

buriedtorn

wrappedwoundshady

crookedpaved

windingbumpy

dirtdusty

muddyrough

slipperyicy

faultyoverloaded

leakyspotty

shoddysubstandard

uneventight

slacksteep

flatnarrow

densethick

largeheavy

undergroundpedestrian

ragingroaring

rumblingcongested

cloggedflooded

overflowingprize

earnedreceived

millionworth

valuedstreaming

electronicdigital

instantonline

interactivebroadband

wirelessmobile

portableautomated

managerialtechnical

financialcorporate

profitablecompetitive

overseascommercial

directstandby

beneficiarytoken

supplementarysupplemental

genericprescription

unlimitedminimum

maximumpremium

rentalaffordable

expensiveinexpensive

cheapconvenientaccessible

suitableidealattractivedesirable

regentmeridian

intercontinentalmediterranean

continentalpacific

atlanticjet

planelanded

flyingfly

intrepidchartered

guidedtrackedsighted

expressoutbound

naturalmineral

orientaloccidental

electricalmechanical

motorelectric

discovereddetected

artificialsurface

equippedfitted

mountedpowered

builtdesigned

modelintegrateddeveloped

operationaloperating

dualfixed

existingconventional

advanceadvanced

increasedexpandedimproved

enhancedmaintained

limitedextended

insuredprudential

neutralpolar

netposted

levelhigh

higherraised

expectedanticipated

lowestoverall

combinedrevised

processedimported

availabledistributed

containedmaterial

documentedcollected

radionightly

dailyweekly

mediumtiny

smalllarger

smallersize

sizedvisible

characteristicdistinctive

uniquedistinct

obvioussurprising

noticeablesubtle

strikingdramatic

lessermyriad

assortedranging

variedvaryingtremendous

considerablegreater

substantialsignificant

massivevast

hugeenormous

widedeep

innerrich

diversemissing

aliveclearedheaded

loadeddriving

walkingcatching

runningshortlong

rightfree

middlehome

halfpast

turnedawayback

cutcutting

easyquick

slowfast

passpassing

forwardaheadbehind

pressedpressing

meetprepared

readymust

willingnoted

describedacknowledged

statedpointed

stressedlearned

understoodrealized

tellingadvised

awareinformed

proveddetermined

neitherconvinced

clearimpossibleotherwise

decidinglikely

unlikelypossible

futurechance

enoughable

stillkept

alonedonesure

goinggo

cominggone

realbig

biggerpretty

littlekind

likehard

excellentquality

bestgood

betterfairfine

protectedprotecting

cleansafe

securecritical

keyimportant

vitalcrucial

mainmajor

leadingactive

involvedresponsible

becomingtransformed

assumedassuming

presentcurrent

merezero

comparableequivalent

changedchanging

similarunusual

subjectlatter

unlikenormal

typicalusual

everyentirewhole

supposedintended

requiredneeded

necessaryused

appliedconnected

separatemultiple

individualrelated

commondifferent

specificparticular

certainmedicalpatient

elementarytextbookbilingual

experimentalmodern

contemporaryalternative

choicepreferred

basicstandard

traditionalcustom

residenthired

retiredtrained

professionalamateur

eligiblequalified

prospectiveselected

selectassigned

ranktop

rankingjunior

eliterecognised

respectedforemost

outstandingexemplary

creativeartistic

worthyultimate

extraordinaryexceptional

enduringgenuine

famouslegendary

greatgreatest

honoreddedicated

absoluteuniversal

ordinaryeveryday

puresatisfying

simplestraightforward

selfconscious

personalphysical

hispanicamerican

nativelatin

asianafrican

sidecross

oppositeoutside

insidefront

standingpriorinitial

lastearlier

earlylater

completedset

fullcomplete

historicmarked

celebratedconverted

settledrelocated

incorporatedestablished

liveliving

newnext

beginningstarting

movedmoving

workingspent

lessfar

evenmuch

wellmade

justone

anothertravelledtraveled

attendedgathered

fewerleastnone

eightnineseven

fivesix

twofour

threetwenty

thirtyten

twelvefifteeneleven

dozenhundred

dividedtogether

arrangedjewish

affiliatedformedjoined

nationwidecoordinated

organisedorganized

saidaccording

officialconfirmed

reportednational

europeaninternational

unionunited

appointedelected

representativerepresented

seniorassociateassistant

executivegeneral

scheduledplanned

augustopened

heldsign

signedapproved

agreedannounced

jointinitiative

presidentialpremier

primeindian

canadianbritish

australianenglish

irishroyal

colonialvictorian

newbornteenage

teenagedadult

femalemale

youngolder

youngerold

bornmarried

mexicancuban

italianbrazilian

spanishportuguese

japanesegreek

turkishgiant

ownedsoldbased

firmboss

exformer

swissfrench

belgiandutch

danishgerman

polishii

romanottomanprussian

metropolitanindustrial

residentialurban

capitalnortheastsoutheast

northsouth

westeast

centralsouthernnorthern

westerneastern

terminalport

baybase

busybustling

downtownuptown

plainbordered

overlookingsurrounding

adjacentsquare

locatedsituated

peacefulmass

expressedvoiced

forthcomingupcoming

recentlatestongoing

continuingcivic

orientedminded

friendlycooperative

understandingsharedsharing

diplomaticstrategic

historicalcultural

manyvarious

severalnumerous

popularknown

consideredlocal

privatepublic

independentlegal

centercomplex

humansocial

environmentalloose

thinsoft

smoothtransparent

relaxingrelaxed

accommodatingconditioned

festiveinviting

welcomingwelcome

customaryroutine

casualintimate

correctwrong

acceptablereasonable

sensibleproper

appropriatepracticalobjective

whatsoevertrue

meaningregardless

meanwhateverknowing

proofdefinite

gearedmotivated

engagedengaging

accustomedalike

lookingthinking

interestedfocused

reliableaccurate

carefulthorough

timelypromptspeedy

intuitiveagileeconomical

suitedutilized

aptpreferable

knowledgeablereputable

responsivehonest

competentgrowing

grownfresh

mixedmild

moderatepositive

strongconsistent

solidcautious

modeststeady

overnightquietcalm

signalalert

elaboratedetailed

extensivebrief

lengthyformal

informalfrequentconstant

plusextra

additionalregularspecial

accompaniedaccompanying

placedattached

summarysupreme

unanimoushandled

addressedresolved

fulfilledrelieved

assuredconfident

pleasedsatisfied

empoweredsupervised

assistedpromisingappealing

demandingacceptedaccepting

reversereversed

drawnhanded

takingtaken

givengiving

exclusivesole

grantedrequestedauthorized

permanentpartial

datedlisted

unknownexact

actualverified

confirmingdisclosed

topnotchmoderne

metaliron

concretestone

votiveilluminated

litlighted

carvedsculpted

painteddecorated

adornedhandmade

antiquestainedpastel

archarched

tiledglazed

paneledornatedomed

immaculatecathedral

outwardinlandfarther

roundedsheltered

twintall

laidlay

nestledtucked

shapedetched

sacredmythical

heavenlydivine

marianeldernoble

hollywoodstarswank

rubyshining

stellarglowing

faintbygone

bustgargantuan

giganticpreciouspriceless

rareexotic

rusticquaint

picturesquesecluded

idyllicposh

upscaleupmarket

parisianchic

fashionabletrendy

plushluxe

palatialluxurious

opulentsumptuous

scenicpanoramic

sunsetsunrise

pleasantbreezysunny

gothicbaroque

architecturaldecorative

interiorexterior

downhillquadruple

doubletriple

swimmingindoor

outdoorworld

olympicgolden

goldsilver

wonwinning

roundfinal

finishedstraight

opengrand

instrumentalsolopianobass

soundfamiliar

soundingsinging

popmaster

magicmagical

twentiethinspired

reminiscentmirrored

reflectedevident

romanticromance

adaptednoveldirecteddirecting

entitledwritten

originalincludedfeatured

strawoval

identicalmatchingmatched

uniformfit

fittingcolor

lightdark

brightblackwhite

browngray

greyyellow

pinkblue

greencurt

pitchedplayedgame

singlefirst

secondthird

eleventhninth

fifthsixth

seventhcapped

timedopeningclosing

stretchmidway

amicablecordial

utmostsincere

lovingrighteous

virtuousconscientious

compassionatehearty

lukewarmgracious

hospitablerespectful

courteouspolite

diligentattentive

solicitousconsiderate

sanitaryblanket

protectivebotanical

landscapedleafy

shadedaerobic

drinkableusable

plentifulabundant

adequateample

satisfactoryworkable

painlesstolerable

worthwhiledoable

inclusiveconducive

complementarybeneficial

flexibleeffectiveefficient

ccstereo

deluxecompact

cdadjustable

trimoptionalgroomed

wededucated

wealthyhealthy

healthiervibrant

cosmopolitantolerantcivilized

windwardvirgin

coralcanary

tuscanvenetian

neapolitanflorentine

jamaicanhawaiian

creolenigh

galorepresto

spellboundpricy

homelyscrumptious

appetizingalfresco

pamperinguncrowdedunclutteredundisturbed

sparserestful

sedateunobtrusive

discreetserenetranquil

joyfulcheerfulcheery

encyclopedicboundless

unparalleledunsurpassed

unmatchedunbeatable

flawlessunderstated

effortlessimpeccablesuperlative

fluentspeaking

spokencapablepowerful

aggressivecounter

smartclever

sophisticatedinnovative

slimtame

marginalmeager

heftysizable

valuableinvaluable

handyhelpfuluseful

sparewasted

savedsaving

generouspaid

payingbravebold

fabulousfantastic

wonderfulterrific

phenomenalastonishing

amazingincredible

awesomemarvelousmarvellous

gloriousmajestic

magnificentsplendid

spectacularstunningbreathtaking

dazzlingsuperbbrilliant

impressiveremarkable

deserveddecent

respectablehappy

okcomfortable

niceperfect

seasonedaccomplished

talentedappreciativeenthusiastic

impressedappreciated

fanciedentertained

adoredkeenavid

curiousexcited

amazeddelighted

thrilledlovedliked

fortunatelucky

gratefulglad

thankfulhotdry

saltbaking

cerealcrackers

peanutfluffy

vanillacaramel

blendedpowdered

nutstoasted

steamedroast

chickenfried

bakedgrilled

cookedroasted

olivemint

cherryhoney

ripejuicy

tendersweet

saltysavory

tastydelicious

darlingpat

acefavourite

favoriteoutback

supermini

classichomemade

takeoutcomplimentary

personalizedinflatable

plasticrubber

disposablecushioned

waterproofremovable

adagioyummy

velvetindigo

frostedquilted

toplessnude

laughingsmiling

petitepolished

exquisiteelegantgraceful

crispsparkling

bubblylush

pristinecolorful

beautifulgorgeous

lovelytanned

refreshedspotlesssqueaky

roomysnug

cozycosy

glossystylishsleek

flashyglamorous

priceyfancy

skinnyoversized

shinysparkly

finerflowerylegible

airyexpansive

enclosedfurnishedspacious

interconnectedaugmentedconfigured

ellipticalcircular

rectangulardiscrete

functionalcorresponding

definedpreset

desiredoptimal

approximateinferior

femininefunny

humorousironic

amusinghilarious

thoughtfulpassionate

energeticlively

spiritedmemorable

unforgettableexciting

fascinatinginterestingintriguing

informativeentertaining

enjoyableauthentictimeless

fashionedquintessential

funkyretro

eclecticminimalist

pleasinginspiring

alluringenchanting

mesmerizingcharmingdelightful

quirkyendearing

earthysoothing

invigoratingrefreshing

wholesomehomey

tastefulclassy

creepycute

adorablechatty

personableapproachable

manneredtalkative

grittyneattidy

humbleunassuming

unpretentious

0 2 4 6 8 10 12 14

Cluster D

endrogram

as.hclust.dendrogram (*, "N

A")dend

Height

Figure 1: A tree that relates adjectives on its leaves

tion. Quantitative and qualitative comparisons in the TripAdvisor example highlight theadvantages of aggregating rare features.

Notation: Given a design matrix X ∈ Rn×p, let xi ∈ Rp denote the feature vector ofobservation i and Xj ∈ Rn denote the jth feature, where i = 1, . . . , n and j = 1, . . . , p.For a vector β ∈ Rp, let supp(β) ⊆ 1, . . . , p denote its support (i.e., the set of indices ofnonzero elements). Let S±(β) := (sign(β`))`=1,...,p encode the signed support of the vectorβ. Let T be a p-leafed tree with root r, set of leaves L(T ) = 1, . . . , p, and set of nodesV(T ) of size |T |. Let Tu be the subtree of T rooted by u for u ∈ V(T ). We follow thecommonly-used notions of child, parent, sibling, descendant, and ancestor to describe therelationships between nodes of a tree. For a matrix A ∈ Rm×n and subset B of 1, . . . , n,let AB ∈ Rm×|B| be the submatrix formed by removing the columns of A not in B. LetS(β`, λ) = sign(β`) ·max|β`|−λ, 0 be the soft-thresholding operator applied to β` ∈ R. Welet ej denote the vector having a one in the jth entry and zero elsewhere. For sequences anand bn, we write an . bn to mean that an/bn is eventually bounded, and we write an = o(bn)to mean that an/bn converges to zero.

2 Rare Features and the Promise of Aggregation

2.1 The Difficulty Posed by Rare Features

Consider the linear model,

y = Xβ∗ + ε, ε ∼ N(0, σ2In). (1)

where y = (y1, . . . , yn)T ∈ Rn is a response vector, X ∈ Rn×p is a design matrix, β∗ isa p-vector of parameters, and ε ∈ Rn is a vector of independent Gaussian errors havingvariance σ2. In this paper, we focus on counts data, i.e., Xij records the frequency of anevent j in observation i. In particular, we will assume throughout that X has non-negativeelements.

The lasso (Tibshirani, 1996) is an estimator that performs variable selection, making itwell-suited to the p n setting:

βlassoλ ∈ arg minβ∈Rp

1

2n‖y −Xβ‖2

2 + λ‖β‖1. (2)

4

Page 5: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

When λ = 0, this coincides with the OLS estimator, which is uniquely defined when n > pand X is full rank:

βOLS(n) = (XTX)−1XTy.

To better understand the challenge posed by rare features, we begin by considering the effectof a single rare feature on OLS in the classical p-fixed, n → ∞ regime. We take the jthfeature to be a binary vector having k nonzeros, where k is a fixed value not depending onn. As n increases, the proportion of nonzero elements, k/n, goes to 0. We show in Theorem1 that βOLS

j (n) does not converge in probability to β∗j with increasing sample size. Thisestablishes that OLS is not a consistent estimator of β∗ even in a p-fixed asymptotic regime.

Theorem 1. Consider the linear model (1) with X ∈ Rn×p having full column rank. Furthersuppose that Xj is a binary vector having (a constant) k nonzeros. It follows that there existsη > 0 for which

lim infn→∞

P(∣∣∣βOLS

j (n)− β∗j∣∣∣ > η

)> 0.

Proof. The result follows from taking lim infn→∞ of both sides of (7) in Appendix A andobserving that 2Φ

(−ηk1/2/σ

)does not depend on n.

The above result highlights the difficulty of estimating the coefficient of a rare feature.This suggests that even when rare features are not explicitly discarded, variable selectionmethods may fail to ever select them regardless of their strength of association with theresponse. Other researchers have also acknowledged the difficulty posed by rare features indifferent scenarios. For example, in the context of hypothesis testing for high-dimensionalsparse binary regression, Mukherjee et al. (2015) shows that when the design matrix is toosparse, any test has no power asymptotically, and signals cannot be detected regardless oftheir strength. Since the failure is caused by the sparsity of the features, it is thereforenatural to ask if “densifying the features” in an appropriate way would fix the problem. Asdiscussed above, aggregating the counts of related events may be a reasonable way to allowa method to make use of the information in rare features.

2.2 Aggregating Rare Features Can Help

Given m subsets of 1, . . . , p, we can form m aggregated features by summing within eachsubset. We can encode these subsets in a binary matrix A ∈ 0, 1p×m and form a new

design matrix of aggregated features as X = XA. The columns of X are also counts, butrepresent the frequency of m different unions of the p original events. For example, if thefirst subset is 1, 6, 8, the first column of A would be e1 + e6 + e8 and the first aggregated

feature would be X1 = X1 + X6 + X8, recording the number of times any of the first,sixth, or eighth events occur. A linear model, Xβ, based on the aggregated features can beequivalently expressed as a linear model, Xβ, in terms of the original features as long as βsatisfies a set of linear constraints (ensuring that it is in the column space of A):

Xβ = (XA)β = X(Aβ) = Xβ.

5

Page 6: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

The vector β lies in the column space of A precisely when it is constant within each of them subsets. For example,

enforcing β1 = β6 = β8 ⇔ aggregating features: X1β1+X6β6+X8β8 = (X1+X6+X8)β1 = X1β1.(3)

In practice, determining how to aggregate features is a challenging problem, and our proposedstrategy in Section 3 will use side information to guide this aggregation.

For now, to understand the potential gains achievable by aggregation, we consider anidealized case in which the correct aggregation of features is given to us by an oracle. In thenext theorem, we construct a situation in which (a) the lasso on the original rare features isunable to correctly recover the support of β∗ for any value of the tuning parameter λ, and(b) an oracle-aggregation of features makes it possible for the lasso to recover the supportof β∗. For simplicity, we take X as a binary matrix, which corresponds to the case inwhich every feature has n/p nonzero observations. We take β∗ to have k blocks of sizep/k, with entries that are constant within each block. The last block is all zeros and theminimal nonzero |β∗j | is restricted to lie within a range that is expands with n, p and k. Theoracle approach delivers to the lasso the k aggregated features that match the structure inβ∗. These aggregated features have n/k nonzeros, and thus are not rare features. Havingpeformed the lasso on these aggregated features, we then duplicate the k elements, p/k timesper group, to get βoracleλ ∈ Rp. The lasso with the oracle-aggregator is shown to achieve high-probability signed support recovery whereas the lasso on the original features fails to achievethis property for all values of the tuning parameter λ.

Theorem 2. Consider the linear model (1) with binary matrix X ∈ 0, 1n×p, p ≤ n,and XTX = (n/p)Ip: every column of X has n/p one’s and X1p = 1n. Suppose β∗ =β∗ ⊗ 1p/k for β∗ = (β∗1 , . . . , β

∗k−1, 0). Suppose k < p/(36 log n). Then, the interval I =

(σ√

4kn

log(k2n), σ√

p3n

log (2cp/k)] with c = 13e(π/2+2)−1

√14

+ 1π

is nonempty and for mini=1,...,k−1

∣∣∣β∗i ∣∣∣ ∈I, the following two statements hold:

(a) The lasso fails to get high-probability signed support recovery:

lim supp→∞

supλ≥0

P(S±(βlassoλ ) = S±(β∗)

)≤ 1

e.

(b) The lasso with an oracle-aggregation of features succeeds in recovering the correct signedsupport for some λ > 0:

limn→∞

P(S±(βoracleλ ) = S±(β∗)

)= 1.

Proof. See Appendix B.

Even when the true model does not have a small number of aggregated features (i.e., β∗

does not have k p distinct values), it may still be beneficial to aggregate. The next resultexhibits a bias-variance tradeoff for feature aggregation.

6

Page 7: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Theorem 3 (Bias-Variance Tradeoff for Feature Aggregated Least Squares). Consider thelinear model (1) with X having full column rank (n > p) and a general vector β∗ ∈ Rp.

Let C = C1, . . . , C|C| be an arbitrary partition of 1, . . . , p. Let βC ∈ Rp be the estimatorformed by performing least squares subject to the constraint that parameters are constantwithin the groups defined by C. Then, the following mean squared estimation result holds:

1

nE‖XβC −Xβ∗‖2 =

1

n‖X‖2

op

|C|∑`=1

∑j∈C`

(β∗j − |C`|−1

∑j′∈C`

β∗j′

)2

+σ2|C|n

Proof. See Appendix C.

The bias term is small when there is a small amount of variability in coefficient valueswithin groups of the partition C, i.e. when β∗ is approximately constant within each group.Even when β∗ in truth has a large k (even k = p), there may still exist a partition C with

|C| k for which the bias term is small and thus E‖XβC−Xβ∗‖2 E‖XβOLS−Xβ∗‖2 =σ2p.

3 Main Proposal: Tree-Guided Aggregation

In the previous section, we have seen the potential gains achievable through aggregating rarefeatures. In this section, we propose a tree-guided method for aggregating and selecting rarefeatures. We discuss this tree in Section 3.1, introduce a tree-based parametrization strategyin Section 3.2, and propose a new estimator in Section 3.3.

3.1 A Tree to Guide Aggregation

To form aggregated variables, it is infeasible to consider all possible partitions of the features1, . . . , p. Rather, we will consider a tree T with leaves 1, . . . , p and restrict ourselves topartitions that can be expressed as a collection of branches of T (see, e.g., Figure 1). Wesum features within a branch to form our new aggregated features.

We would like to aggregate features that are related, and thus we would like to have Tencode feature similarity information. Such information about the features comes from priorknowledge and/or data sources external to the current regression problem (i.e., not fromy and X). For example, for microbiome data, T could be the phylogenetic tree encodingevolutionary relationships among the OTUs (e.g., Matsen et al. 2010; Tang et al. 2016;Wang and Zhao 2017a) or the co-occurrence of OTUs from past data sets. When featurescorrespond to words, closeness in meaning can be used to form T (e.g., in Section 6, weperform hierarchical clustering on word embeddings that were learned from an enormouscorpus).

In (3), we demonstrated how aggregating a set of features is equivalent to setting thesefeatures’ coefficients to be equal. To perform tree-guided aggregation, we therefore associatea coefficient βj with each leaf of T and “fuse” (i.e., set equal to each other) any coefficientswithin a branch that we wish to aggregate.

7

Page 8: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

γ8

γ6

γ1 γ2 γ3

γ7

γ4 γ5

β1 β2 β3 β4 β5

γ8

γ6

γ1 γ2 γ3

γ7

γ4 γ5

β1 β2 β3 β4 β5

Figure 2: (Left) An example of β ∈ R5 and T that relates the corresponding five features.By (4), we have βi = γi + γ6 + γ8 for i = 1, 2, 3 and βj = γj + γ7 + γ8 for j = 4, 5. (Right)By zeroing out the γi’s in the gray nodes, we aggregate β into two groups indicated by thedashed contours: β1 = β2 = β3 = γ6 + γ8 and β4 = β5 = γ8. Counts data are aggregated forfeatures sharing the same coefficient: Xβ = (X1 +X2 +X3)β1 + (X4 +X5)β4.

3.2 A Tree-Based Parametrization

In order to fuse βj’s within a branch, we adopt a tree-based parametrization by assigning aparameter γu to each node u in T (this includes both leaves and interior nodes). The leftpanel of Figure 2 gives an example. Let ancestor(j) ∪ j be the set of nodes in the pathfrom the root of T to the jth feature, which is associated with the jth leaf. We express βjas the sum of all the γu’s on the path:

βj =∑

u∈ancestor(j)∪j

γu. (4)

This can be written more compactly as β = Aγ, where A ∈ 0, 1p×|T | is a binary matrixwith Ajk := 1uk∈ancestor(j)∪j = 1j∈descendant(uk)∪uk. The descendants of each node udefine a branch Tu, and zeroing out γv’s for all v ∈ descendant(u) fuses the coefficients inthis branch, i.e., βj : j ∈ L(Tu). Thus, γdescendant(u) = 0 is equivalent to aggregating thefeatures Xj with j ∈ L(Tu) (see the right panel of Figure 2).

Another way of viewing this parametrization’s merging of branches is by expressingXβ =XAγ, where (XA)ik =

∑pj=1 XijAjk =

∑j:j∈descendant(uk)∪ukXij aggregates counts over all

the descendant features of node uk. By aggregating nearby features, we allow rare features toborrow strength from their neighbors, allowing us to estimate shared coefficient values thatwould otherwise be too difficult to estimate. In the next section, we describe an optimizationproblem that uses the γ parametrization to simultaneously perform feature aggregation andselection.

3.3 The Optimization Problem

Our proposed estimator β is the solution to the following convex optimization problem:

minβ∈Rp,γ∈R|T |

1

2n‖y −Xβ‖2

2 + λ

α |T |∑`=1

w` |γ`|+ (1− α)

p∑j=1

wj |βj|

s.t. β = Aγ

.

(5)

8

Page 9: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

We apply a weighted `1 penalty to induce sparsity in γ, which in turn induces fusion ofthe coefficients in β. In the high-dimensional setting, sparsity in feature coefficients is alsodesirable, so we also apply a weighted `1 penalty on β. The tuning parameter λ controlsthe overall penalization level while α determines the trade-off between the two types ofregularization: fusion and sparsity. In practice, both λ and α are determined via crossvalidation. The choice of weights is left to the user. Our theoretical results (Section 4)suggest a particular choice of weights, although in our empirical studies (Sections 5 and 6)we simply take all weights to be 1 except for the root, which we leave unpenalized (wroot = 0).Choosing wroot = 0 allows one to apply strong shrinkage of all coefficients towards a commonvalue other than zero.

When α = 0, (5) reduces to a lasso problem in β; when α = 1, (5) reduces to a lassoproblem in γ. Both extreme cases can be efficiently solved with a lasso solver such as glmnet(Friedman et al., 2010). For α ∈ (0, 1), (5) is a generalized lasso problem (Tibshirani andTaylor, 2011) in γ, and can be solved in principle using preexisting solvers (e.g., Arnoldand Tibshirani 2014). However, better computational performance, in particular in high-dimensional settings, can be attained using an algorithm specially tailored to our problem.With weights wr = 0 and w` = 1, wj = 1`6=r,j∈[1,p], we write (5) as a global consensusproblem and solve this using alternating direction method of multipliers (ADMM, Boyd et al.(2011)). The consensus problem introduces additional copies of β and γ, which decouplesthe various parts of the problem, leading to efficient ADMM updates:

minβ(1),β(2),β(3),β∈Rpand γ(1),γ(2),γ∈R|T |

1

2n

∥∥y −Xβ(1)∥∥2

2+ λ

α |T |∑`=1

w`

∣∣∣γ(1)`

∣∣∣+ (1− α)

p∑j=1

wj

∣∣∣β(2)j

∣∣∣ (6)

s.t. β(3) = Aγ(2),β = β(1) = β(2) = β(3) and γ = γ(1) = γ(2).

In particular, our ADMM approach requires performing a singular value decomposition(SVD) on X, an SVD on (Ip : −A) (these are reused for all λ and α), and then applyingmatrix multiplies and soft-thresholdings until convergence. See Algorithm 1 in Appendix Dfor details. Appendix D.1 provides a derivation of Algorithm 1 and Appendix D.2 discussesa slight modification for including an intercept, which is desirable in practice.

3.4 Connections to Other Work

Before proceeding, we draw some connections to other work. Wang and Zhao (2017b) in-troduce a penalized regression method with high-dimensional and compositional covariatesthat uses a phylogenetic tree. They apply an `1 penalty on the sum of coefficients withina subtree, for every possible subtree in the phylogenetic tree. Their penalty is designed toencourage the sum of coefficients within a subtree to be zero, which naturally detects a sub-composition of microbiome features. By contrast, our method applies an `1 penalty on ourlatent variables γu’s, which induces the regression coefficients within subtrees to have equalvalues. Thus, their penalty encourages the sum of coefficients within a subtree to be zero,whereas ours induces equality. For a simple example, suppose β1 and β2 form a subtree ofthe phylogenetic tree. Their penalty would promote β1 = −β2 whereas ours would promoteβ1 = β2. The basic assumption of their method is that contrasts of species abundances within

9

Page 10: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

a subtree may be predictive, whereas the basic assumption of our method is that averagespecies abundance within a subtree may be predictive. These different structural assump-tions will be appropriate in different situations. In this paper’s context of rare features, ourassumption is the relevant one: Consider a subtree containing a large number of species. Byasking for equality of coefficients, our method promotes the aggregation of species countsacross the subtree, leading to denser features; by contrast, asking for coefficients to sum tozero does not address the problem of feature rarity.

Existing work has also considered the setting in which regression coefficients are thoughtto be clustered into a small number of groups of equal coefficient value. For example, She(2010) and Ke et al. (2015) do this by penalizing coefficient differences. Neither method,however, is focused specifically on rare features and thus they do not rely on side informationto perform this clustering of coefficients. In our setting, the side information provided by thetree plays an important role in compensating for the extremely small amount of informationavailable about rare features.

Several other methods assume a relevant undirected graph over the predictors and use agraph-Laplacian or graph-total-variation penalty to promote equality of coefficients that arenearby on the graph (Li and Li, 2010; Li et al., 2018). Depending on the setting, this graphmay either be pure side information (Li and Li, 2010) or be a covariance graph estimatedbased on X itself (Li et al., 2018). While the above methods use graph information “edge-by-edge”, Yu and Liu (2016) incorporate graphical information “node-by-node” to promotejoint selection of predictors that are connected on the graph.

Guinot et al. (2017) considers a similar idea of aggregating genomic features with the helpof a hierarchical clustering tree; however, in the tree is learned from the design matrix and theprediction task is only used to determine the level of tree cut, whereas our method in effectuses the response to flexibly choose differing aggregation levels across the tree. We considera strategy similar to theirs, which we call L1-ag-h in the empirical comparisons. Kim et al.(2012) propose a tree-guided group lasso approach in the context of multi-response regression.In their context, the tree relates the different responses and is used to borrow strength acrossrelated prediction tasks. Zhai et al. (2018) propose a variance component selection schemethat aggregates OTUs to clusters at higher phylogenetic level, and treats the aggregatedtaxonomic clusters as multiple random effects in a variance component model. Finally,Khabbazian et al. (2016) propose a phylogenetic lasso method to study trait evolution fromcomparative data and detect past changes in the expected mean trait values.

4 Statistical Theory

In this section, we study the prediction consistency of our method. Since T encodes featuresimilarity information, throughout the section we require T to be a “full” tree such that eachnode is either a leaf or possesses at least two child nodes. We begin with some definitions.

Definition 1. We say that B ⊆ V(T ) is an aggregating set with respect to T if L(Tu) :u ∈ B forms a partition of L(T ).

The black circles in Figure 3 form an aggregating set since their branches’ leaves are apartition of 1, . . . , 8. We would like to refer to “the true aggregating set B∗ with respect

10

Page 11: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

u1

u2 u3

u4

u5

1.3 1.3 1.3 3.5 4.7 0 0 3.5β∗1 β∗2 β∗3 β∗4 β∗5 β∗6 β∗7 β∗8

Figure 3: In the above tree, B∗ = u1, u2, u3, u4, y5 has its nodes labeled with black circles.

to T ” and, to do so, we must first establish that there exists a unique coarsest aggregatingset corresponding to a vector β∗.

Lemma 1. For any β∗ ∈ Rp, there exists a unique coarsest aggregating set B∗ := B(β∗, T ) ⊆V(T ) (hereafter “the aggregating set”) with respect to the tree T such that (a) β∗j = β∗k forj, k ∈ L(Tu) ∀u ∈ B∗, (b) |β∗j − β∗k| > 0 for j ∈ L(Tu) and k ∈ L(Tv) for siblings u, v ∈ B∗.

The lemma (proved in Appendix E) defines B∗ as the aggregating set such that furthermerging of siblings would mean that β∗ is not constant within each subset of the partition.

Definition 2. Given the triplet (T ,β∗,X), we define (a) X = XAB∗ ∈ Rn×|B∗| to be thedesign matrix of aggregated features, which uses B∗ = B(β∗, T ) as the aggregating set, and(b) β∗ ∈ R|B∗| to be the coefficient vector using these aggregated features: β∗ = AB∗β

∗.

We are now ready to provide a bound on the prediction error of our estimator, which isproved in Appendix F.

Theorem 4 (Prediction Error Bound). If we take λ ≥ 4σ√

log(2p)/n, wj = ‖Xj‖2 /√n for

1 ≤ j ≤ p and w` = ‖XA`‖2 /√n for 1 ≤ ` ≤ |T |, then

1

n

∥∥∥Xβ −Xβ∗∥∥∥2

2≤ 3λ

((1− α)

∑j∈A∗

wj|β∗j |+ α∑`∈B∗

w`|β∗` |

)

holds with probability at least 1− 2/p for any α ∈ [0, 1].

This is a slow rate bound for our method. The standard slow rate bound for the lassois σ

√log p/n‖β∗‖1. The next corollary establishes that our method, for any choice of α,

achieves this rate.

Corollary 1. Suppose ‖Xj‖2 ≤√n for 1 ≤ j ≤ p. Then, taking λ = 4σ

√log(2p)/n and

using the weights in Theorem 4,

1

n‖X(β − β∗)‖2

2 . σ√

log p/n‖β∗‖1

holds with probablity at least 1− 2/p for any α ∈ [0, 1].

11

Page 12: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Proof. See Appendix G.

The previous corollary establishes that one does not worsen the rate by using our methodover the lasso in a generic setting. But is there an advantage to using our method? Ourmethod is designed to do well in circumstances in which |B∗| is small, that is β∗ is mostlyconstant with grouping given by the provided tree. The next corollary considers the extremecase in which β∗ is constant (and nonzero).

Corollary 2. Under the conditions of Corollary 1, suppose β∗1 = . . . = β∗p 6= 0. Taking

α ≥ [1 + ‖X1p‖2/(p√n)]−1

,

1

n

∥∥∥Xβ −Xβ∗∥∥∥2

2. σ

√log p

n‖β∗‖1 ·

‖X1p‖2

p√n

holds with probability at least 1 − 2/p. Thus, this improves the lasso rate when ‖X1p‖2 =o(p√n).

Proof. See Appendix H.

For certain sparse designs, the above condition holds. For example, when n = p andX =

√nIn, ‖X1p‖2 =

√n‖1n‖2 = n which is o(p

√n) = o(n3/2). The next proposition

considers a more general sparse X scenario.

Proposition 1. Suppose each column of X has exactly r nonzero entries chosen at randomand independently of all other columns. Suppose all nonzero entries equal

√n/r so that

‖Xj‖2 =√n for every column 1 ≤ j ≤ p. If 8 log(n+1)

3p< r

n→ 0, then ‖X1p‖2/(p

√n)→ 0 in

probability.

Proof. See Appendix I.

Proposition 1 combined with Corollary 2 demonstrates that our method can improve theprediction error rate over the lasso. While this corollary focuses on the rather extreme casein which all coefficients are equal, we expect the result to be generalizable to a wider classof settings. Indeed, the empirical results of the next section suggest that our method canoutperform the lasso in many settings.

5 Simulation Study

We start by forming a tree T with p leaves. To do so, we generate p latent points in Rand then apply hierarchical clustering (using hclust in R Core Team (2016) with completelinkage). We would like the tree to partition the leaves into k clusters of varying sizes and atdiffering heights in the tree (which will correspond to the true aggregation of the features).To do so, we first generate k cluster means µ1, . . . , µk ∈ R with µi = 1/i. The first k/2means have 3p/(2k) associated latent points each, and the remaining k/2 means have p/(2k)associated latent points each. The latent points associated with µi are drawn independentlyfrom N (µi, τ

2 minj(µi − µj)2), where τ = 0.05.

12

Page 13: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Are featuresaggregated?

L1: lasso

L1-dense: lasso on dense features

No

Is the aggregationsupervised? L1-ag-h: lasso, aggregated by height

L1-ag-d: lasso, aggregated by cluster densityNo

our methodYes

Yes

Figure 4: A comparison between our method and four other methods

By design, there are k interior nodes in T corresponding to these k groups, which we indexby B∗. We form A corresponding to this tree and generate β∗ = AB∗β

∗. We zero out k · selements of β∗ ∈ Rk and draw the magnitudes of the remaining elements independently froma Uniform(1.5, 2.5) distribution. We alternate signs of the nonzero coefficients of β∗. Thedesign matrix X ∈ Rn×p is simulated from a Poisson(0.02) distribution, and the responsey ∈ Rn is simulated from (1) with σ = ‖Xβ∗‖2/

√5n. For every method under consideration,

we average its performance over 100 repetitions in all the following simulations.We consider both low-dimensional (n = 500, p = 100, s = 0) and high-dimensional

(n = 100, p = 200, s ∈ 0.2, 0.6) scenarios, in each case taking a sequence of k values up top/2. We apply our method with T and vary the tuning parameters (α, λ) along an 8-by-50grid of values. We take all weights equal to 1 except that of the root node, which we take aszero (leaving it unpenalized). In the low-dimensional case, we compare our method to oracleleast squares, in which we perform least squares on XAB∗ . Oracle least squares representsthe best possible performance of any method that attempts to aggregate features. We alsoinclude least squares on the original design matrix X. In the high-dimensional case, wecompare our method to the oracle lasso, in which the true aggregation XAB∗ (but not thesparsity in β∗) is known, and to the lasso and ridge regression, which are each computedacross a grid of 50 values of the tuning parameter.

In addition to the above methods, we compare our method to three other approaches,meant to represent variations of how the lasso is typically applied when rare features arepresent (see Figure 4 for a schematic). The first approach, which we refer to as L1-dense,applies the lasso after first discarding any features that are in fewer than 1% of observations.The second and third approaches apply the lasso with features aggregated according to Tin an unsupervised manner. The second approach, L1-ag-h, aggregates features that are inthe same cluster after cutting the tree at a certain height. In addition to the lasso tuningparameter, the height at which we cut the tree is a second tuning parameter (chosen alongan equally-spaced grid of eight values). The third approach, L1-ag-d, performs merges ina bottom-up fashion along the tree until all aggregated features have density above somethreshold. This threshold is an additional tuning parameter (chosen along an equally spacedgrid of eight values between 0.01 and 1). The lasso tuning parameter in these methods isalways chosen along a grid of 50 values.

We measure the best mean-squared estimation error, i.e., minΛ ‖β(Λ) − β∗‖22/p, where

“best” is with respect to each method’s tuning parameter(s) Λ. The top two panels ofFigure 5 show the performance of the methods in the low-dimensional and high-dimensionalscenarios, respectively. Given that our method includes least squares and the lasso as specialcases, it is no surprise that our methods have better attainable performance than those

13

Page 14: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

10 20 30 40 50

0.00

0.05

0.10

0.15

0.20

0.25

(n, p, s) = (500, 100, 0)

k

Bes

t mea

n−sq

uare

d es

timat

ion

erro

r

least squaresL1−denseL1−ag−hL1−ag−dour methodoracle least squares

20 40 60 80 1000.

00.

51.

01.

52.

02.

53.

0

(n, p, s) = (100, 200, 0.2)

k

Bes

t mea

n−sq

uare

d es

timat

ion

erro

r

lassoridgeL1−denseL1−ag−hL1−ag−dour methodoracle lasso

0.1 0.2 0.3 0.4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

(n, p, s, k) = (100, 200, 0.2, 40)

τ

Bes

t mea

n−sq

uare

d es

timat

ion

erro

r

lassoridgeL1−denseL1−ag−hL1−ag−dour methodoracle lasso

20 40 60 80 100

0.0

0.5

1.0

1.5

2.0

2.5

3.0

(n, p, s) = (100, 200, 0.6)

k

Bes

t mea

n−sq

uare

d es

timat

ion

erro

r

lassoridgeL1−denseL1−ag−hL1−ag−dour methodoracle lasso

Figure 5: Estimation error of all methods under (Top Left) (n, p, s) = (500, 100, 0) ver-sus varying k ∈ 10, 20, 30, 40, 50; (Top Right) (n, p, s) = (100, 200, 0.2) versus vary-ing k ∈ 20, 40, 60, 80, 100; (Bottom Left) (n, p, s, k) = (100, 200, 0.2, 40) versus varyingτ ∈ 0.05, 0.15, 0.25, 0.35, 0.45; (Bottom Right) (n, p, s) = (100, 200, 0.6) versus varyingk ∈ 20, 40, 60, 80, 100.

14

Page 15: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

0.5 0.6 0.7 0.8 0.9 1.0

4060

8010

012

0

(n, p, s, k) = (100, 200, 0.2, 40)

Mean Rand Index (Group Recovery)

Mea

n H

amm

ing

Dis

tanc

e (S

uppo

rt R

ecov

ery)

lassoridgeL1−denseL1−ag−hL1−ag−dour methodoracle lasso

0.5 0.6 0.7 0.8 0.9 1.0

4060

8010

012

0

(n, p, s, k) = (100, 200, 0.6, 40)

Mean Rand Index (Group Recovery)

Mea

n H

amm

ing

Dis

tanc

e (S

uppo

rt R

ecov

ery)

lassoridgeL1−denseL1−ag−hL1−ag−dour methodoracle lasso

Figure 6: Mean Hamming distance versus mean Rand index for various methods’best estimates under (Left) (n, p, s, k) = (100, 200, 0.2, 40) and (Right) (n, p, s, k) =(100, 200, 0.6, 40). For our method, the eight circles correspond to eight different α values,varying from 0 to 1. Decreasing circle size corresponds to increasing α value.

methods. These results indicate that our method performs nearly as well as the oracle whenthe true number of aggregated features, k, is small and degrades (along with the oracle) asthis quantity increases. The two other methods that use the tree, L1-ag-h and L1-ag-d,do less well than our method, but still do better than L1-dense, which simply discards rarefeatures. In the (n, p, s) = (100, 200, 0.2) case, L1-dense performs almost identically to thelasso, while L1-ag-h and L1-ag-d degrade to the lasso as k increases. By comparing theright two panels of Figure 5, we notice our method outperforms the ridge at large k when sincreases from 0.2 to 0.6, which can be explained by the increased sparsity in β∗.

We also evaluate model performance with respect to group recovery and support recovery.Recall that our method is computed over eight α values, between 0 and 1, and fifty λ values.At each α, we find the minimizer β(λ) for ‖β(λ)−β∗‖2

2 over all λ values. We measure grouprecovery by computing the Rand index (comparing the grouping of β to that of β∗), andmeasure support recovery by computing the Hamming distance (between the supports of βto that of β∗). The closer the Rand index is to one, the better our method recovers thecorrect groups. The smaller the Hamming distance is, the better our method recovers thecorrect support. For the high-dimensional scenario with k = 40, we plot eight pairs (one foreach α value) of Rand index and Hamming distance values for s = 0.2 and s = 0.6 in Figure6. We also compute the two metrics for the lasso, ridge, L1-dense, L1-ag-h, L1-ag-d, andoracle lasso. In the left panel of Figure 6, which corresponds to the low-sparsity case inβ∗, our method achieves its best performance at the largest α value. As the sparsity levelincreases, we see from the right panel of Figure 6 that the best α shifts towards zero. Inboth cases, our method outperforms the lasso, ridge, L1-dense, L1-ag-h, and L1-ag-d.

Clearly, the performance of our method, L1-ag-h and, L1-ag-d will depend on the qualityof the tree being used. In the previous simulations we provided our method with a tree thatis perfectly compatible with the true aggregating set. In practice, the tree used may be onlyan approximate representation of how features should be aggregated. We therefore study the

15

Page 16: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

sensitivity of our method to misspecification of the tree. We return to the high-dimensionalsetting above with k = 40, and we generate a sequence of trees that are increasingly distortedrepresentations of how the data should in fact be aggregated.

We begin with a true aggregation of the features into k groups as described before. In eachrepetition of the simulation, we generate a (random) tree T by performing hierarchical clus-tering on p random variables generated similarly as before except having increasing τ value,as a way to control the degradation level of the tree. When τ is small, the latent variableswill be well-separated by group so that the tree will have an aggregating set that matches thetrue aggregation structure (with high probability). As τ ∈ 0.05, 0.15, 0.25, 0.35, 0.45 in-creases, the between-group variability becomes relatively smaller compared to within-groupvariability, and thus the information provided by the tree becomes increasingly weak. Thebottom left panel of Figure 5 shows the degradation of our method as τ increases when(n, p, s, k) = (100, 200, 0.2, 40). Our method, L1-ag-h, and L1-ag-d all suffer from a poor-quality tree; the latter two degrade more quickly than ours.

6 Application to Hotel Reviews

Wang et al. (2010) crawled TripAdvisor.com to form a dataset2 of 235,793 reviews and ratingsof 1,850 hotels by users between February 14, 2009 and March 15, 2009. While there areseveral kinds of ratings, we focus on a user’s overall rating of the hotel (on a 1 to 5 scale),which we take as our response. We form a document-term matrix X in which Xij is thenumber of times the ith review uses the jth adjective.

We begin by converting words to lower case and keeping only adjectives (as determined byWordNet Fellbaum 1998; Wallace 2007; Feinerer and Hornik 2016). After removing reviewswith missing ratings, we are left with 209,987 reviews and 7,787 distinct adjectives. Theleft panel of Figure 7 shows the distribution of ratings in the data: nearly three quarters ofall ratings are above 3 stars. The extremely right-skewed distribution in the right panel ofFigure 7 shows that all but a small number of adjectives are highly rare (e.g., over 90% ofadjectives are used in fewer than 0.5% of reviews).

Rather than discard this large number of rare adjectives, our method aims to make pro-ductive use of these by leveraging side information about the relationship between adjectives.We construct a tree capturing adjective similarity as follows. We start with word embed-dings3 in a 100-dimensional space that were pre-trained by GloVe (Pennington et al., 2014)on the Gigaword5 and Wikipedia2014 corpora. We also obtain a list of adjectives, which theNRC Emotion Lexicon labels as having either positive or negative sentiments (Mohammadand Turney, 2013). We use five nearest neighbors classification within the 100-dimensionalspace of word embeddings to assign labels to the 5,795 adjectives that have not been labeledin the NRC Emotion Lexicon. This sentiment separation determines the two main branchesof the tree T . Within each branch, we perform hierarchical clustering of the word embeddingvectors. Figure 8 depicts such a tree with 2,397 adjectives (as leaves).

We compare our method (with weights all equal to 1 except for the root node, whichis left unpenalized) to four other approaches described in Figure 4. For L1-dense, we first

2Data source: http://times.cs.uiuc.edu/~wang296/Data/3Data source: http://nlp.stanford.edu/data/glove.6B.zip

16

Page 17: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Rating Proportion1 0.0662 0.0853 0.1054 0.3085 0.436

% of Reviews Using Adjective

Num

ber o

f Adj

ectiv

es

0.0 0.2 0.4 0.6 0.8 1.00

1000

2000

3000

4000

Figure 7: (Left) distribution of TripAdvisor ratings. (Right) only 414 adjectives appear inmore than 1% of reviews; the histogram gives the distribution of usage-percentages for thoseadjectives appearing in fewer than 1% of reviews.

02

46

810

1214

expectable

pickled

corned

stocked

stuffed

packaged rum

alaskan

salmon

lengthwise

coarse

uncooked

avocado

shredded

chopped

peeled

rotten fat

bottom

tops

topped

topping

frozen

dried

steaming

whipping

leftover

stale

boiled

sour

hungry

starving

exhausted

dehydrated

caring

tending

distressed

stricken

blind

deaf

disabled

homeless

elderly

aged

eighteen fifty

ninety

contracted

treated ill

pregnant

sick

dying

conjoined

dominican

returning

departed

stranded

rescued

mod

ultra

eponym

ous

hip

punk

secret

hidden

dirty fake

animal pet

wild

mad

colored

orange red

highland

crescent

midwestern

rural

unpopulated

undeveloped

untouched

inaccessible

wooded hilly

sprawling

populated

scrub

shallow

rocky

sandy

nautical

loco

sportive

tropical

torrential

cloudy

damp

wet

humid

rainy

windy

chilly

weather

cold

warm cool

hourly

annual

yearly

gross

total

minus

projected

surpassing

whopping

average

adjusted il xx

prepaid

mailed

postal

misrepresented

inflated

skewed

unfounded

misleading

false

dubious

questionable

incomplete

inconsistent

incorrect

sketchy

obscure

aforem

entioned

peculiar

seem

ing

confusing

ambiguous

problematic

unsolicited

unauthorised

forfeit

outright

heard

loud

yelled

shouted

scream

ing

crying

blaring

banging

attic

upstairs

downstairs

awake

asleep

waking

awakened

sleeping

breathing

otc ci

wee

wan

tan

bum

capitalist

orientated

lumbering

imperialistic

ethnic

foreign

domestic thai

indonesian

korean

vietnamese

chinese

masculine

gay

abusive

abused

harassed

communist

political

liberal

penal

mandatory

strict

forbidden

prohibited

violated

restricted

limiting

slight

downward

dim

faded

dashed

disappointing

dism

allackluster

subdued

upbeat

tepid

stock

dipped

rose fell

dropping

falling

risen

fallen

lowered

slashed

sterling

recovering

recovered

lost

beat

beaten

ranked

fourth

consecutive

missed

shot

tied

lone

minute

foul

frank

sometime

mid

late

ago

fired

dism

issed

sent

ordered

sufficient

lacking

minimal

excessive

excess

waste

raw

scarce

worst

blam

eblam

edstruggling

experienced

faced

hurt

bad

worse

poor

weak

affected

impacted

negative

favorable

dangerous

isolated

unstable

elevated

relative

low

reduced

bush

federal

internal

due

previous

following

subsequent

hearing

incident

superior

potent

formidable

probable

potential

apparent

serious

remote

neighboring

camp

near

nearby

scrambled

rushed

rush

waiting

stopped

collect

tried

trying

decided

wanted

attempted

forced

unable

remaining left

removed

abandoned

spread

found

covered

exposed

offensive

defensive

uniformed

loyal

armed

fighting

pro

anti

opposed

backed

explosive

unidentified

searching

spotted

automatic

warning

flash

killing

stabbing

executed

infamous

notorious

charged

guilty

suspect

criminal

alleged

petty

drunken

drunk

operative

cutaneous

facial

bone

minor

sustained

suffering

severe

bodily

mental

fossil

warming

magnetic

hydraulic

ambient

fluid

liquid

synthetic

additive

content

analog

runny

itchy

scratchy

edible

inedible

fragrant

potted

rust

rusty

powdery

mushy

greasy

flaky

sticky

watery

hairy

hollow

fuzzy

eyed buff

reddish

bluish

darkening

opaque

murky

okay

alright

dear

damn

remiss

napping

seasick

picky

brag

bragging

inquiring

discerning

tripping

lingual

biolum

inescent

equidistant

interlaced

schematic

derisive

audible

buzzing

whispering

upwind

buggy

huffy

toed

hooked

wired

plugged

fab

gabby

littler

bedded

looseleaf

tropic

pied

trojan

iberian

smoking

secondhand

addictive

alcoholic

gastrointestinal

allergic

asthmatic

sensitive

touchy

negotiable

obligatory

overdue

undecided

shy

inclined

worried

concerned

confused

unhappy

frustrated

hesitant

reluctant

unwilling

eager

anxious

skeptical

wary

advertised

billed

booked

unavailable

rested

disposed

compensated

graded

categorized

checked

sorted

farthest

distant

approaching

wandering

sneaking

touching

bouncing

crossed

halfway

stuck

sideways

dragging

sliding

driven

rolling

rolled

floating

drifting

tipped

stray

discrim

inatory

discrim

inating

intrusive

explicit

unsafe

impractical

unusable

poorly

ineffective

inadequate

consum

ing

costly

repetitive

needless

unnecessary

discouraged

apprehensive

leery

mum

adam

ant

disturbed

horrified

disgusted

appalled

dism

ayed

hysterical

terrified

distraught

stung

embarrassed

annoyed

irritated

bothered

complaining

thunderstruck

gobsmacked

flabbergasted

speechless

apologetic

incredulous

bewildered

baffled

perplexed

puzzled

unimpressed

disheartened

perturbed

offended

disliked

dazzled

enam

ored

pestered

patronized

lazy

kindly

wise

jolly sly

insane

incompetent

unresponsive

selfish

arrogant

naive

ignorant

uncaring

aloof

forgiving

squeam

ish

fussy

indulgent

obnoxious

overbearing

cocky

pushy

uptight

surly

sullen

clueless

grum

pylame

rude

downright

nasty

ugly

stupid

silly

weird

crazy

inexperienced

disorganized

disorganised

unimaginative

unmotivated

indifferent

uninterested

noisy

rowdy

unruly

disgruntled

irate

meagre

sizeable

exorbitant

extortionate

overpriced

tempting

enticing

limitless

lofty

manageable

unmanageable

bearable

intolerable

unbearable

aggravating

compounded

stressful

pleasurable

draining

taxing

spoilt

spoiled

dreary

miserable

lousy

mediocre

feeble

pathetic

pitiful

hurried

fumbling

clum

syhaphazard

perfunctory

halfhearted

wimpy

crappy

touristy

grouchy

crotchety

snooty

snotty

smallish

grungy

scruffy

constipated

thieving

grubby

dodgy

tatty

tipsy

skanky

slothful

bothersome

nauseating

yucky

lumpy

untidy

insubstantial

unappealing

unattractive

raunchy

trashy

tasteless

sham

eless

ostentatious

obtrusive

impersonal

unsupportive

unromantic

uninviting

repulsive

revolting

stereotyped

soulless

glorified

decadent

unsavory

harmless

insignificant

imaginable

serviceable

characterless

flavorless

existent

unmentionable

borderline

atrocious

deplorable

appalling

disgusting

sham

eful

disgraceful

inexcusable

repugnant

undesirable

disadvantageous

unhelpful

unfriendly

unprofessional

dishonest

inconsiderate

impolite

disrespectful

mistaken

ignored

contrary

justified

beforehand

realised

understandable

unavoidable

unfortunate

solved

corrected

rectified

flush

crack

cracked

interfering

obliged

obliging

reverting

bowing

system

atic

cleansing

irregular

transient

endless

countless

occasional

sporadic

evil

invisible

mysterious

sheer

utter

unbelievable

nonsense

outrageous

ludicrous

absurd

ridiculous

frightening

scary

bizarre odd

strange

eerie

chilling

ominous

futuristic

unreal

worrying

depressing

alarming

disturbing

troubling

horrid

hideous

ghastly

dreadful

horrendous

horrific

horrifying

terrible

horrible

awful

inconvenient

useless

pointless

boring

tedious

dull

bland

unpleasant

uncomfortable

awkward

unnerving

disconcerting

annoying

irritating

abrupt

unexpected

sudden

immediate

swift

stark

blunt

sharp

extreme

harsh

draconian

drastic

humiliating

embarrassing

shocking

hostile

unacceptable

inappropriate

disappointed

surprised

shocked

stunned

angered

angry

furious

answering

repeated

echoing

nervous

tired

bored

weary

impatient

sad

sorry

ashamed

upsetting

upset

tricky

challenging

difficult

tough

figured

wondering

noticed

seeing

afraid

scared

wanting

pretend

desperate

vain

unlucky

tiring

frustrating

packed

crowded

jammed

filled

empty

surrounded

lined

folding

wooden

draped

bearing

pictured

framed

stacked

strung

barefoot

legged

bare

naked

staring

seated

sitting

dressed

wearing

worn

dress

beige

skimpy

wrinkled

soiled

rear

overhead

upper

outer

blank

flip

diagonal

squared

narrowing

gaping

umbrella

mat

tiered

roofed

musty

moldy

smelling

putrid

fishy

hokey

pretentious

tacky

cheesy

spooky

sneaky

groovy

smoky

bouncy

mellow

melodious

boggy

waterlogged

soggy

faultless

passable

moonlit

drizzly

comfy

cushy

stuffy

cram

ped

cluttered

swanky

seedy

dank

claustrophobic

filthy

smelly

threadbare

shabby

dingy

grimy

spartan

drab

ramshackle

decrepit

uphill

gruelling

tortuous

sleepless

hectic

frantic

overpowering

suffocating

unrelenting

blazing

leisurely

brisk

uneventful

eventful

unimpressive iffy

unsatisfying

unannounced

attendant

nonstop

understaffed

lax

outdated

cumbersom

einefficient

inoperable

operable

unattended

unlocked

ventilated

moveable

movable

motorized

stationary

unclean

sterile

sterilized

watered

scrubbed

patched

hydrated

tonic

soundproof

hydroponic

crusted

proofed

treed

deadened

slopped

thudding

vertiginous

extraordinaire

manque

slippy

piscine

barefooted

offish

laxative

swish

redux

spanking

maniac

repellent

repellant

especial

copious

frayed

soured

marred

chaotic

tense

heated split

dominated

destroyed

damaged

ruined

wrecked

shut

shuttered

interrupted

ended

broke

swept

hammered

struck

suspended

cancelled

delayed

threatened

continued

renewed

close

closed

blocked

sealed

controlled

regulated

reserved

vested

temporary

vacant

communal

unoccupied

occupied

raining

splashed

soaked

washed

littered

scattered

burning

burnt

burned

apart

broken

blown

thrown

parked

smashed

smashing

preserved

neglected

overlooked

forgotten

sunken

mangled

grave

dead

buried

torn

wrapped

wound

shady

crooked

paved

winding

bumpy dirt

dusty

muddy

rough

slippery icy

faulty

overloaded

leaky

spotty

shoddy

substandard

uneven

tight

slack

steep

flat

narrow

dense

thick

large

heavy

underground

pedestrian

raging

roaring

rumbling

congested

clogged

flooded

overflowing

prize

earned

received

million

worth

valued

stream

ing

electronic

digital

instant

online

interactive

broadband

wireless

mobile

portable

automated

managerial

technical

financial

corporate

profitable

competitive

overseas

commercial

direct

standby

beneficiary

token

supplementary

supplemental

generic

prescription

unlim

ited

minimum

maximum

prem

ium

rental

affordable

expensive

inexpensive

cheap

convenient

accessible

suitable

ideal

attractive

desirable

regent

meridian

intercontinental

mediterranean

continental

pacific

atlantic jet

plane

landed

flying fly

intrepid

chartered

guided

tracked

sighted

express

outbound

natural

mineral

oriental

occidental

electrical

mechanical

motor

electric

discovered

detected

artificial

surface

equipped

fitted

mounted

powered

built

designed

model

integrated

developed

operational

operating

dual

fixed

existing

conventional

advance

advanced

increased

expanded

improved

enhanced

maintained

limited

extended

insured

prudential

neutral

polar

net

posted

level

high

higher

raised

expected

anticipated

lowest

overall

combined

revised

processed

imported

available

distributed

contained

material

documented

collected

radio

nightly

daily

weekly

medium tiny

small

larger

smaller

size

sized

visible

characteristic

distinctive

unique

distinct

obvious

surprising

noticeable

subtle

striking

dram

atic

lesser

myriad

assorted

ranging

varied

varying

tremendous

considerable

greater

substantial

significant

massive

vast

huge

enormous

wide

deep

inner

rich

diverse

missing

alive

cleared

headed

loaded

driving

walking

catching

running

short

long

right

free

middle

home

half

past

turned

away

back cut

cutting

easy

quick

slow fast

pass

passing

forward

ahead

behind

pressed

pressing

meet

prepared

ready

must

willing

noted

described

acknowledged

stated

pointed

stressed

learned

understood

realized

telling

advised

aware

informed

proved

determined

neither

convinced

clear

impossible

otherwise

deciding

likely

unlikely

possible

future

chance

enough

able still

kept

alone

done

sure

going go

coming

gone real big

bigger

pretty

little

kind like

hard

excellent

quality

best

good

better

fair

fine

protected

protecting

clean

safe

secure

critical

key

important

vital

crucial

main

major

leading

active

involved

responsible

becoming

transformed

assumed

assuming

present

current

mere

zero

comparable

equivalent

changed

changing

similar

unusual

subject

latter

unlike

normal

typical

usual

every

entire

whole

supposed

intended

required

needed

necessary

used

applied

connected

separate

multiple

individual

related

common

different

specific

particular

certain

medical

patient

elem

entary

textbook

bilingual

experim

ental

modern

contem

porary

alternative

choice

preferred

basic

standard

traditional

custom

resident

hired

retired

trained

professional

amateur

eligible

qualified

prospective

selected

select

assigned

rank top

ranking

junior

elite

recognised

respected

foremost

outstanding

exem

plary

creative

artistic

worthy

ultim

ate

extraordinary

exceptional

enduring

genuine

famous

legendary

great

greatest

honored

dedicated

absolute

universal

ordinary

everyday

pure

satisfying

simple

straightforward

self

conscious

personal

physical

hispanic

american

native

latin

asian

african

side

cross

opposite

outside

inside

front

standing

prior

initial

last

earlier

early

later

completed set

full

complete

historic

marked

celebrated

converted

settled

relocated

incorporated

established

live

living

new

next

beginning

starting

moved

moving

working

spent

less far

even

much

well

made

just

one

another

travelled

traveled

attended

gathered

fewer

least

none

eight

nine

seven

five six

two

four

three

twenty

thirty

ten

twelve

fifteen

eleven

dozen

hundred

divided

together

arranged

jewish

affiliated

formed

joined

nationw

ide

coordinated

organised

organized

said

according

official

confirm

edreported

national

european

international

union

united

appointed

elected

representative

represented

senior

associate

assistant

executive

general

scheduled

planned

august

opened

held

sign

signed

approved

agreed

announced

joint

initiative

presidential

prem

ier

prime

indian

canadian

british

australian

english

irish

royal

colonial

victorian

newborn

teenage

teenaged

adult

female

male

young

older

younger

old

born

married

mexican

cuban

italian

brazilian

spanish

portuguese

japanese

greek

turkish

giant

owned

sold

based

firm

boss ex

former

swiss

french

belgian

dutch

danish

german

polish ii

roman

ottoman

prussian

metropolitan

industrial

residential

urban

capital

northeast

southeast

north

south

west

east

central

southern

northern

western

eastern

terminal

port

bay

base

busy

bustling

downtown

uptown

plain

bordered

overlooking

surrounding

adjacent

square

located

situated

peaceful

mass

expressed

voiced

forthcoming

upcoming

recent

latest

ongoing

continuing

civic

oriented

minded

friendly

cooperative

understanding

shared

sharing

diplom

atic

strategic

historical

cultural

many

various

several

numerous

popular

known

considered

local

private

public

independent

legal

center

complex

human

social

environm

ental

loose

thin

soft

smooth

transparent

relaxing

relaxed

accommodating

conditioned

festive

inviting

welcom

ing

welcom

ecustom

ary

routine

casual

intim

ate

correct

wrong

acceptable

reasonable

sensible

proper

appropriate

practical

objective

whatsoever

true

meaning

regardless

mean

whatever

knowing

proof

definite

geared

motivated

engaged

engaging

accustom

edalike

looking

thinking

interested

focused

reliable

accurate

careful

thorough

timely

prom

ptspeedy

intuitive

agile

econom

ical

suited

utilized apt

preferable

knowledgeable

reputable

responsive

honest

competent

growing

grown

fresh

mixed

mild

moderate

positive

strong

consistent

solid

cautious

modest

steady

overnight

quiet

calm

signal

alert

elaborate

detailed

extensive brief

lengthy

formal

informal

frequent

constant

plus

extra

additional

regular

special

accompanied

accompanying

placed

attached

summary

suprem

eunanimous

handled

addressed

resolved

fulfilled

relieved

assured

confident

pleased

satisfied

empowered

supervised

assisted

prom

ising

appealing

demanding

accepted

accepting

reverse

reversed

drawn

handed

taking

taken

given

giving

exclusive sole

granted

requested

authorized

permanent

partial

dated

listed

unknown

exact

actual

verified

confirm

ing

disclosed

topnotch

moderne

metal

iron

concrete

stone

votive

illuminated lit

lighted

carved

sculpted

painted

decorated

adorned

handmade

antique

stained

pastel

arch

arched

tiled

glazed

paneled

ornate

domed

immaculate

cathedral

outward

inland

farther

rounded

sheltered

twin tall

laid lay

nestled

tucked

shaped

etched

sacred

mythical

heavenly

divine

marian

elder

noble

hollywood star

swank

ruby

shining

stellar

glowing

faint

bygone

bust

gargantuan

gigantic

precious

priceless

rare

exotic

rustic

quaint

picturesque

secluded

idyllic

posh

upscale

upmarket

parisian

chic

fashionable

trendy

plush

luxe

palatial

luxurious

opulent

sumptuous

scenic

panoramic

sunset

sunrise

pleasant

breezy

sunny

gothic

baroque

architectural

decorative

interior

exterior

downhill

quadruple

double

triple

swimming

indoor

outdoor

world

olym

pic

golden

gold

silver

won

winning

round

final

finished

straight

open

grand

instrumental

solo

piano

bass

sound

familiar

sounding

singing

pop

master

magic

magical

twentieth

inspired

reminiscent

mirrored

reflected

evident

romantic

romance

adapted

novel

directed

directing

entitled

written

original

included

featured

straw

oval

identical

matching

matched

uniform fit

fitting

color

light

dark

bright

black

white

brown

gray

grey

yellow

pink

blue

green

curt

pitched

played

game

single

first

second

third

eleventh

ninth

fifth

sixth

seventh

capped

timed

opening

closing

stretch

midway

amicable

cordial

utmost

sincere

loving

righteous

virtuous

conscientious

compassionate

hearty

lukewarm

gracious

hospitable

respectful

courteous

polite

diligent

attentive

solicitous

considerate

sanitary

blanket

protective

botanical

landscaped

leafy

shaded

aerobic

drinkable

usable

plentiful

abundant

adequate

ample

satisfactory

workable

painless

tolerable

worthwhile

doable

inclusive

conducive

complem

entary

beneficial

flexible

effective

efficient cc

stereo

deluxe

compact cd

adjustable

trim

optional

groomed wed

educated

wealthy

healthy

healthier

vibrant

cosm

opolitan

tolerant

civilized

windward

virgin

coral

canary

tuscan

venetian

neapolitan

florentine

jamaican

hawaiian

creole

nigh

galore

presto

spellbound

pricy

homely

scrumptious

appetizing

alfresco

pampering

uncrowded

uncluttered

undisturbed

sparse

restful

sedate

unobtrusive

discreet

serene

tranquil

joyful

cheerful

cheery

encyclopedic

boundless

unparalleled

unsurpassed

unmatched

unbeatable

flawless

understated

effortless

impeccable

superlative

fluent

speaking

spoken

capable

powerful

aggressive

counter

smart

clever

sophisticated

innovative

slim

tame

marginal

meager

hefty

sizable

valuable

invaluable

handy

helpful

useful

spare

wasted

saved

saving

generous

paid

paying

brave

bold

fabulous

fantastic

wonderful

terrific

phenom

enal

astonishing

amazing

incredible

awesom

emarvelous

marvellous

glorious

majestic

magnificent

splendid

spectacular

stunning

breathtaking

dazzling

superb

brilliant

impressive

remarkable

deserved

decent

respectable

happy ok

comfortable

nice

perfect

seasoned

accomplished

talented

appreciative

enthusiastic

impressed

appreciated

fancied

entertained

adored

keen avid

curious

excited

amazed

delighted

thrilled

loved

liked

fortunate

lucky

grateful

glad

thankful hot

dry

salt

baking

cereal

crackers

peanut

fluffy

vanilla

caramel

blended

powdered

nuts

toasted

steamed

roast

chicken

fried

baked

grilled

cooked

roasted

olive mint

cherry

honey

ripe

juicy

tender

sweet

salty

savory

tasty

delicious

darling pat

ace

favourite

favorite

outback

super

mini

classic

homem

ade

takeout

complimentary

personalized

inflatable

plastic

rubber

disposable

cushioned

waterproof

removable

adagio

yummy

velvet

indigo

frosted

quilted

topless

nude

laughing

smiling

petite

polished

exquisite

elegant

graceful

crisp

sparkling

bubbly

lush

pristine

colorful

beautiful

gorgeous

lovely

tanned

refreshed

spotless

squeaky

room

ysnug

cozy

cosy

glossy

stylish

sleek

flashy

glam

orous

pricey

fancy

skinny

oversized

shiny

sparkly

finer

flowery

legible

airy

expansive

enclosed

furnished

spacious

interconnected

augm

ented

configured

elliptical

circular

rectangular

discrete

functional

corresponding

defined

preset

desired

optim

alapproximate

inferior

feminine

funny

humorous

ironic

amusing

hilarious

thoughtful

passionate

energetic

lively

spirited

mem

orable

unforgettable

exciting

fascinating

interesting

intriguing

informative

entertaining

enjoyable

authentic

timeless

fashioned

quintessential

funky

retro

eclectic

minimalist

pleasing

inspiring

alluring

enchanting

mesmerizing

charming

delightful

quirky

endearing

earthy

soothing

invigorating

refreshing

wholesome

homey

tasteful

classy

creepy

cute

adorable

chatty

personable

approachable

mannered

talkative

gritty

neat

tidy

humble

unassuming

unpretentious

Figure 8: Tree T over 2,397 adjectives: the left subtree is for adjectives with negativesentiment and the right subtree is for adjectives with positive sentiment.

17

Page 18: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

discard any adjectives that are in fewer than 0.5% of reviews before applying the lasso. BothL1-ag-h and L1-ag-d have an additional tuning parameter to take care of: for L1-ag-h

we vary the height at which we cut the tree along an equally-spaced grid of ten values; forL1-ag-d we choose the threshold for aggregations’ density along an equally spaced grid often values between 0.001 and 0.1.

Mean Squared Prediction Errorprop. n p n/p our method L1 L1-dense L1-ag-h L1-ag-d

1% 1,700 2,397 0.71 0.870 0.894 0.895 0.882 0.9715% 8,499 3,962 2.15 0.783 0.790 0.805 0.785 0.899

10% 16,999 4,786 3.55 0.758 0.764 0.788 0.764 0.90220% 33,997 5,621 6.05 0.742 0.749 0.773 0.747 1.17340% 67,995 6,472 10.51 0.739 0.740 0.768 0.742 1.10860% 101,992 6,962 14.65 0.733 0.736 0.769 0.734 1.15580% 135,990 7,294 18.64 0.733 0.733 0.765 0.734 0.886

100% 169,987 7,573 22.45 0.729 0.731 0.765 0.731 0.956

Table 1: Performance of five methods on the held-out test set: L1 is the lasso; L1-denseis the lasso on only dense features; L1-ag-h is the lasso with features aggregated based onheight; and L1-ag-d is the lasso with features aggregated based on density level.

We hold out 40,000 ratings and reviews as a test set. To observe the performance of thesemethods over a range of training set sizes, we consider a nested sequence of training sets,ranging from 1% to 100% of the reviews not included in the test set. For all methods, weuse five-fold cross validation to select tuning parameters and threshold all predicted ratingsto be within the interval [1, 5]. Table 1 displays the mean squared prediction error (MSPE)on the test set for each method and training set size.

As the size of the training set increases, all methods except for the lasso with aggregationbased on density (L1-ag-d) achieve lower MSPE. Among the four lasso-related methods, L1and L1-ag-h outperform the other two. As the training set size n increases, the numberof features p also increase but at a relatively slower rate. We notice that when n/p is lessthan 10.51, our method outperforms the other four lasso-related methods. As n/p increasesbeyond 10.51, i.e., in the statistically easier regimes, L1 and L1-ag-h attain performancecomparable to our method. We conduct paired t-tests between squared prediction errorsfrom our method and L1-ag-h at every (n, p) pair (i.e., every row of Table 1). Five out ofthe eight tests are significant at the 0.005 significance level (See Appendix J Table 4).

To better understand the difference between our method, the lasso, and L1-ag-h, we colorthe branches of the tree generated in the n = 1, 700 and p = 2, 397 case (i.e., proportionis 1%) according to the sign and magnitude of β for the three methods. The bottom treein Figure 9 corresponds to our method and has many nearby branches sharing the samered/blue color, indicating that the corresponding adjective counts have been merged. Bycontrast, the top tree in Figure 9, which corresponds to the lasso, shows that the solutionis sparser and does not have branches of similar color. The middle tree in Figure 9 showsthat L1-ag-h produces something between the two, merging some adjectives with strongsignals while keeping the rest as singletons. The strength and pattern of aggregations vary

18

Page 19: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

02

46

810

1214

expectable

pickled

corned

stocked

stuffed

packaged rum

alaskan

salmon

lengthwise

coarse

uncooked

avocado

shredded

chopped

peeled

rotten fat

bottom

tops

topped

topping

frozen

dried

steaming

whipping

leftover

stale

boiled

sour

hungry

starving

exhausted

dehydrated

caring

tending

distressed

stricken

blind

deaf

disabled

homeless

elderly

aged

eighteen fifty

ninety

contracted

treated ill

pregnant

sick

dying

conjoined

dominican

returning

departed

stranded

rescued

mod

ultra

eponym

ous

hip

punk

secret

hidden

dirty fake

animal pet

wild

mad

colored

orange red

highland

crescent

midwe

stern

rural

unpopulated

undeveloped

untouched

inaccessible

wooded hilly

sprawling

populated

scrub

shallow

rocky

sandy

nautical

loco

sportive

tropical

torrential

cloudy

damp

wet

humid

rainy

windy

chilly

weather

cold

warm cool

hourly

annual

yearly

gross

total

minus

projected

surpassing

whopping

average

adjusted il xx

prepaid

mailed

postal

misrepresented

inflated

skewed

unfounded

misleading

false

dubious

questionable

incomplete

inconsistent

incorrect

sketchy

obscure

aforem

entioned

peculiar

seem

ing

confusing

ambiguous

problematic

unsolicited

unauthorised

forfeit

outright

heard

loud

yelled

shouted

scream

ing

crying

blaring

banging

attic

upstairs

downstairs

awake

asleep

waking

awakened

sleeping

breathing

otc ci

wee

wan

tan

bum

capitalist

orientated

lumbering

imperialistic

ethnic

foreign

domestic thai

indonesian

korean

vietnamese

chinese

masculine

gay

abusive

abused

harassed

communist

political

liberal

penal

mandatory

strict

forbidden

prohibited

violated

restricted

limiting

slight

downward

dim

faded

dashed

disappointing

dism

allackluster

subdued

upbeat

tepid

stock

dipped

rose fell

dropping

falling

risen

fallen

lowe

red

slashed

sterling

recovering

recovered

lost

beat

beaten

ranked

fourth

consecutive

missed

shot

tied

lone

minute

foul

frank

sometime

mid

late

ago

fired

dism

issed

sent

ordered

sufficient

lacking

minimal

excessive

excess

waste

raw

scarce

worst

blam

eblam

edstruggling

experienced

faced

hurt

bad

worse

poor

weak

affected

impacted

negative

favorable

dangerous

isolated

unstable

elevated

relative

low

reduced

bush

federal

internal

due

previous

following

subsequent

hearing

incident

superior

potent

formidable

probable

potential

apparent

serious

remote

neighboring

camp

near

nearby

scrambled

rushed

rush

waiting

stopped

collect

tried

trying

decided

wanted

attempted

forced

unable

remaining left

removed

abandoned

spread

found

covered

exposed

offensive

defensive

uniformed

loyal

armed

fighting

pro

anti

opposed

backed

explosive

unidentified

searching

spotted

automatic

warning

flash

killing

stabbing

executed

infamous

notorious

charged

guilty

suspect

criminal

alleged

petty

drunken

drunk

operative

cutaneous

facial

bone

minor

sustained

suffering

severe

bodily

mental

fossil

warming

magnetic

hydraulic

ambient

fluid

liquid

synthetic

additive

content

analog

runny

itchy

scratchy

edible

inedible

fragrant

potted

rust

rusty

powdery

mushy

greasy

flaky

sticky

watery

hairy

hollow

fuzzy

eyed buff

reddish

bluish

darkening

opaque

murky

okay

alright

dear

damn

remiss

napping

seasick

picky

brag

bragging

inquiring

discerning

tripping

lingual

biolum

inescent

equidistant

interlaced

schematic

derisive

audible

buzzing

whispering

upwind

buggy

huffy

toed

hooked

wired

plugged

fab

gabby

littler

bedded

looseleaf

tropic

pied

trojan

iberian

smoking

secondhand

addictive

alcoholic

gastrointestinal

allergic

asthmatic

sensitive

touchy

negotiable

obligatory

overdue

undecided

shy

inclined

worried

concerned

confused

unhappy

frustrated

hesitant

reluctant

unwilling

eager

anxious

skeptical

wary

advertised

billed

booked

unavailable

rested

disposed

compensated

graded

categorized

checked

sorted

farthest

distant

approaching

wandering

sneaking

touching

bouncing

crossed

halfway

stuck

sideways

dragging

sliding

driven

rolling

rolled

floating

drifting

tipped

stray

discrim

inatory

discrim

inating

intrusive

explicit

unsafe

impractical

unusable

poorly

ineffective

inadequate

consum

ing

costly

repetitive

needless

unnecessary

discouraged

apprehensive

leery

mum

adam

ant

disturbed

horrified

disgusted

appalled

dism

ayed

hysterical

terrified

distraught

stung

embarrassed

annoyed

irritated

bothered

complaining

thunderstruck

gobsmacked

flabbergasted

speechless

apologetic

incredulous

bewildered

baffled

perplexed

puzzled

unimpressed

disheartened

perturbed

offended

disliked

dazzled

enam

ored

pestered

patronized

lazy

kindly

wise

jolly sly

insane

incompetent

unresponsive

selfish

arrogant

naive

ignorant

uncaring

aloof

forgiving

squeam

ish

fussy

indulgent

obnoxious

overbearing

cocky

pushy

uptight

surly

sullen

clueless

grum

pylame

rude

downright

nasty

ugly

stupid

silly

weird

crazy

inexperienced

disorganized

disorganised

unimaginative

unmotivated

indifferent

uninterested

noisy

rowdy

unruly

disgruntled

irate

meagre

sizeable

exorbitant

extortionate

overpriced

tempting

enticing

limitless

lofty

manageable

unmanageable

bearable

intolerable

unbearable

aggravating

compounded

stressful

pleasurable

draining

taxing

spoilt

spoiled

dreary

miserable

lousy

mediocre

feeble

pathetic

pitiful

hurried

fumbling

clum

syhaphazard

perfunctory

halfhearted

wimpy

crappy

touristy

grouchy

crotchety

snooty

snotty

smallish

grungy

scruffy

constipated

thieving

grubby

dodgy

tatty

tipsy

skanky

slothful

bothersome

nauseating

yucky

lumpy

untidy

insubstantial

unappealing

unattractive

raunchy

trashy

tasteless

sham

eless

ostentatious

obtrusive

impersonal

unsupportive

unromantic

uninviting

repulsive

revolting

stereotyped

soulless

glorified

decadent

unsavory

harmless

insignificant

imaginable

serviceable

characterless

flavorless

existent

unmentionable

borderline

atrocious

deplorable

appalling

disgusting

sham

eful

disgraceful

inexcusable

repugnant

undesirable

disadvantageous

unhelpful

unfriendly

unprofessional

dishonest

inconsiderate

impolite

disrespectful

mistaken

ignored

contrary

justified

beforehand

realised

understandable

unavoidable

unfortunate

solved

corrected

rectified

flush

crack

cracked

interfering

obliged

obliging

reverting

bowing

system

atic

cleansing

irregular

transient

endless

countless

occasional

sporadic

evil

invisible

mysterious

sheer

utter

unbelievable

nonsense

outrageous

ludicrous

absurd

ridiculous

frightening

scary

bizarre odd

strange

eerie

chilling

ominous

futuristic

unreal

worrying

depressing

alarming

disturbing

troubling

horrid

hideous

ghastly

dreadful

horrendous

horrific

horrifying

terrible

horrible

awful

inconvenient

useless

pointless

boring

tedious

dull

bland

unpleasant

uncomfortable

awkward

unnerving

disconcerting

annoying

irritating

abrupt

unexpected

sudden

immediate

swift

stark

blunt

sharp

extreme

harsh

draconian

drastic

humiliating

embarrassing

shocking

hostile

unacceptable

inappropriate

disappointed

surprised

shocked

stunned

angered

angry

furious

answering

repeated

echoing

nervous

tired

bored

weary

impatient

sad

sorry

ashamed

upsetting

upset

tricky

challenging

difficult

tough

figured

wondering

noticed

seeing

afraid

scared

wanting

pretend

desperate

vain

unlucky

tiring

frustrating

packed

crowded

jammed

filled

empty

surrounded

lined

folding

wooden

draped

bearing

pictured

framed

stacked

strung

barefoot

legged

bare

naked

staring

seated

sitting

dressed

wearing

worn

dress

beige

skimpy

wrinkled

soiled

rear

overhead

upper

outer

blank

flip

diagonal

squared

narrowing

gaping

umbrella

mat

tiered

roofed

musty

moldy

smelling

putrid

fishy

hokey

pretentious

tacky

cheesy

spooky

sneaky

groovy

smoky

bouncy

mellow

melodious

boggy

waterlogged

soggy

faultless

passable

moonlit

drizzly

comfy

cushy

stuffy

cram

ped

cluttered

swanky

seedy

dank

claustrophobic

filthy

smelly

threadbare

shabby

dingy

grimy

spartan

drab

ramshackle

decrepit

uphill

gruelling

tortuous

sleepless

hectic

frantic

overpowe

ring

suffocating

unrelenting

blazing

leisurely

brisk

uneventful

eventful

unimpressive iffy

unsatisfying

unannounced

attendant

nonstop

understaffed

lax

outdated

cumbersom

einefficient

inoperable

operable

unattended

unlocked

ventilated

moveable

movable

motorized

stationary

unclean

sterile

sterilized

watered

scrubbed

patched

hydrated

tonic

soundproof

hydroponic

crusted

proofed

treed

deadened

slopped

thudding

vertiginous

extraordinaire

manque

slippy

piscine

barefooted

offish

laxative

swish

redux

spanking

maniac

repellent

repellant

especial

copious

frayed

soured

marred

chaotic

tense

heated split

dominated

destroyed

damaged

ruined

wrecked

shut

shuttered

interrupted

ended

broke

swept

hammered

struck

suspended

cancelled

delayed

threatened

continued

renewe

dclose

closed

blocked

sealed

controlled

regulated

reserved

vested

temporary

vacant

communal

unoccupied

occupied

raining

splashed

soaked

washed

littered

scattered

burning

burnt

burned

apart

broken

blown

thrown

parked

smashed

smashing

preserved

neglected

overlooked

forgotten

sunken

mangled

grave

dead

buried

torn

wrapped

wound

shady

crooked

paved

winding

bumpy dirt

dusty

muddy

rough

slippery icy

faulty

overloaded

leaky

spotty

shoddy

substandard

uneven

tight

slack

steep

flat

narrow

dense

thick

large

heavy

underground

pedestrian

raging

roaring

rumbling

congested

clogged

flooded

overflowing

prize

earned

received

million

worth

valued

stream

ing

electronic

digital

instant

online

interactive

broadband

wireless

mobile

portable

automated

managerial

technical

financial

corporate

profitable

competitive

overseas

commercial

direct

standby

beneficiary

token

supplementary

supplemental

generic

prescription

unlim

ited

minimum

maximum

prem

ium

rental

affordable

expensive

inexpensive

cheap

convenient

accessible

suitable

ideal

attractive

desirable

regent

meridian

intercontinental

mediterranean

continental

pacific

atlantic jet

plane

landed

flying fly

intrepid

chartered

guided

tracked

sighted

express

outbound

natural

mineral

oriental

occidental

electrical

mechanical

motor

electric

discovered

detected

artificial

surface

equipped

fitted

mounted

powe

red

built

designed

model

integrated

developed

operational

operating

dual

fixed

existing

conventional

advance

advanced

increased

expanded

improved

enhanced

maintained

limited

extended

insured

prudential

neutral

polar

net

posted

level

high

higher

raised

expected

anticipated

lowe

stoverall

combined

revised

processed

imported

available

distributed

contained

material

documented

collected

radio

nightly

daily

weekly

medium tiny

small

larger

smaller

size

sized

visible

characteristic

distinctive

unique

distinct

obvious

surprising

noticeable

subtle

striking

dram

atic

lesser

myriad

assorted

ranging

varied

varying

tremendous

considerable

greater

substantial

significant

massive

vast

huge

enormous

wide

deep

inner

rich

diverse

missing

alive

cleared

headed

loaded

driving

walking

catching

running

short

long

right

free

middle

home

half

past

turned

away

back cut

cutting

easy

quick

slow fast

pass

passing

forward

ahead

behind

pressed

pressing

meet

prepared

ready

must

willing

noted

described

acknowledged

stated

pointed

stressed

learned

understood

realized

telling

advised

aware

informed

proved

determined

neither

convinced

clear

impossible

otherwise

deciding

likely

unlikely

possible

future

chance

enough

able still

kept

alone

done

sure

going go

coming

gone real big

bigger

pretty

little

kind like

hard

excellent

quality

best

good

better

fair

fine

protected

protecting

clean

safe

secure

critical

key

important

vital

crucial

main

major

leading

active

involved

responsible

becoming

transformed

assumed

assuming

present

current

mere

zero

comparable

equivalent

changed

changing

similar

unusual

subject

latter

unlike

normal

typical

usual

every

entire

whole

supposed

intended

required

needed

necessary

used

applied

connected

separate

multiple

individual

related

common

different

specific

particular

certain

medical

patient

elem

entary

textbook

bilingual

experim

ental

modern

contem

porary

alternative

choice

preferred

basic

standard

traditional

custom

resident

hired

retired

trained

professional

amateur

eligible

qualified

prospective

selected

select

assigned

rank top

ranking

junior

elite

recognised

respected

foremost

outstanding

exem

plary

creative

artistic

worthy

ultim

ate

extraordinary

exceptional

enduring

genuine

famous

legendary

great

greatest

honored

dedicated

absolute

universal

ordinary

everyday

pure

satisfying

simple

straightforward

self

conscious

personal

physical

hispanic

american

native

latin

asian

african

side

cross

opposite

outside

inside

front

standing

prior

initial

last

earlier

early

later

completed set

full

complete

historic

marked

celebrated

converted

settled

relocated

incorporated

established

live

living

new

next

beginning

starting

moved

moving

working

spent

less far

even

much

well

made

just

one

another

travelled

traveled

attended

gathered

fewe

rleast

none

eight

nine

seven

five six

two

four

three

twenty

thirty

ten

twelve

fifteen

eleven

dozen

hundred

divided

together

arranged

jewish

affiliated

formed

joined

nationw

ide

coordinated

organised

organized

said

according

official

confirm

edreported

national

european

international

union

united

appointed

elected

representative

represented

senior

associate

assistant

executive

general

scheduled

planned

august

opened

held

sign

signed

approved

agreed

announced

joint

initiative

presidential

prem

ier

prime

indian

canadian

british

australian

english

irish

royal

colonial

victorian

newborn

teenage

teenaged

adult

female

male

young

older

younger

old

born

married

mexican

cuban

italian

brazilian

spanish

portuguese

japanese

greek

turkish

giant

owned

sold

based

firm

boss ex

former

swiss

french

belgian

dutch

danish

german

polish ii

roman

ottoman

prussian

metropolitan

industrial

residential

urban

capital

northeast

southeast

north

south

west

east

central

southern

northern

western

eastern

terminal

port

bay

base

busy

bustling

downtown

uptown

plain

bordered

overlooking

surrounding

adjacent

square

located

situated

peaceful

mass

expressed

voiced

forthcoming

upcoming

recent

latest

ongoing

continuing

civic

oriented

minded

friendly

cooperative

understanding

shared

sharing

diplom

atic

strategic

historical

cultural

many

various

several

numerous

popular

known

considered

local

private

public

independent

legal

center

complex

human

social

environm

ental

loose

thin

soft

smooth

transparent

relaxing

relaxed

accommodating

conditioned

festive

inviting

welcom

ing

welcom

ecustom

ary

routine

casual

intim

ate

correct

wrong

acceptable

reasonable

sensible

proper

appropriate

practical

objective

whatsoever

true

meaning

regardless

mean

whatever

knowing

proof

definite

geared

motivated

engaged

engaging

accustom

edalike

looking

thinking

interested

focused

reliable

accurate

careful

thorough

timely

prom

ptspeedy

intuitive

agile

econom

ical

suited

utilized apt

preferable

knowledgeable

reputable

responsive

honest

competent

growing

grown

fresh

mixed

mild

moderate

positive

strong

consistent

solid

cautious

modest

steady

overnight

quiet

calm

signal

alert

elaborate

detailed

extensive brief

lengthy

formal

informal

frequent

constant

plus

extra

additional

regular

special

accompanied

accompanying

placed

attached

summary

suprem

eunanimous

handled

addressed

resolved

fulfilled

relieved

assured

confident

pleased

satisfied

empowe

red

supervised

assisted

prom

ising

appealing

demanding

accepted

accepting

reverse

reversed

drawn

handed

taking

taken

given

giving

exclusive sole

granted

requested

authorized

permanent

partial

dated

listed

unknown

exact

actual

verified

confirm

ing

disclosed

topnotch

moderne

metal

iron

concrete

stone

votive

illuminated lit

lighted

carved

sculpted

painted

decorated

adorned

handmade

antique

stained

pastel

arch

arched

tiled

glazed

paneled

ornate

domed

immaculate

cathedral

outward

inland

farther

rounded

sheltered

twin tall

laid lay

nestled

tucked

shaped

etched

sacred

mythical

heavenly

divine

marian

elder

noble

hollywo

od star

swank

ruby

shining

stellar

glowing

faint

bygone

bust

gargantuan

gigantic

precious

priceless

rare

exotic

rustic

quaint

picturesque

secluded

idyllic

posh

upscale

upmarket

parisian

chic

fashionable

trendy

plush

luxe

palatial

luxurious

opulent

sumptuous

scenic

panoramic

sunset

sunrise

pleasant

breezy

sunny

gothic

baroque

architectural

decorative

interior

exterior

downhill

quadruple

double

triple

swimming

indoor

outdoor

world

olym

pic

golden

gold

silver

won

winning

round

final

finished

straight

open

grand

instrumental

solo

piano

bass

sound

familiar

sounding

singing

pop

master

magic

magical

twentieth

inspired

reminiscent

mirrored

reflected

evident

romantic

romance

adapted

novel

directed

directing

entitled

written

original

included

featured

straw

oval

identical

matching

matched

uniform fit

fitting

color

light

dark

bright

black

white

brown

gray

grey

yellow

pink

blue

green

curt

pitched

played

game

single

first

second

third

eleventh

ninth

fifth

sixth

seventh

capped

timed

opening

closing

stretch

midway

amicable

cordial

utmost

sincere

loving

righteous

virtuous

conscientious

compassionate

hearty

lukewarm

gracious

hospitable

respectful

courteous

polite

diligent

attentive

solicitous

considerate

sanitary

blanket

protective

botanical

landscaped

leafy

shaded

aerobic

drinkable

usable

plentiful

abundant

adequate

ample

satisfactory

workable

painless

tolerable

worthwhile

doable

inclusive

conducive

complem

entary

beneficial

flexible

effective

efficient cc

stereo

deluxe

compact cd

adjustable

trim

optional

groomed wed

educated

wealthy

healthy

healthier

vibrant

cosm

opolitan

tolerant

civilized

windward

virgin

coral

canary

tuscan

venetian

neapolitan

florentine

jamaican

hawaiian

creole

nigh

galore

presto

spellbound

pricy

homely

scrumptious

appetizing

alfresco

pampering

uncrowded

uncluttered

undisturbed

sparse

restful

sedate

unobtrusive

discreet

serene

tranquil

joyful

cheerful

cheery

encyclopedic

boundless

unparalleled

unsurpassed

unmatched

unbeatable

flawless

understated

effortless

impeccable

superlative

fluent

speaking

spoken

capable

powe

rful

aggressive

counter

smart

clever

sophisticated

innovative

slim

tame

marginal

meager

hefty

sizable

valuable

invaluable

handy

helpful

useful

spare

wasted

saved

saving

generous

paid

paying

brave

bold

fabulous

fantastic

wonderful

terrific

phenom

enal

astonishing

amazing

incredible

awesom

emarvelous

marvellous

glorious

majestic

magnificent

splendid

spectacular

stunning

breathtaking

dazzling

superb

brilliant

impressive

remarkable

deserved

decent

respectable

happy ok

comfortable

nice

perfect

seasoned

accomplished

talented

appreciative

enthusiastic

impressed

appreciated

fancied

entertained

adored

keen avid

curious

excited

amazed

delighted

thrilled

loved

liked

fortunate

lucky

grateful

glad

thankful hot

dry

salt

baking

cereal

crackers

peanut

fluffy

vanilla

caramel

blended

powdered

nuts

toasted

steamed

roast

chicken

fried

baked

grilled

cooked

roasted

olive mint

cherry

honey

ripe

juicy

tender

sweet

salty

savory

tasty

delicious

darling pat

ace

favourite

favorite

outback

super

mini

classic

homem

ade

takeout

complimentary

personalized

inflatable

plastic

rubber

disposable

cushioned

waterproof

removable

adagio

yummy

velvet

indigo

frosted

quilted

topless

nude

laughing

smiling

petite

polished

exquisite

elegant

graceful

crisp

sparkling

bubbly

lush

pristine

colorful

beautiful

gorgeous

lovely

tanned

refreshed

spotless

squeaky

room

ysnug

cozy

cosy

glossy

stylish

sleek

flashy

glam

orous

pricey

fancy

skinny

oversized

shiny

sparkly

finer

flowe

rylegible

airy

expansive

enclosed

furnished

spacious

interconnected

augm

ented

configured

elliptical

circular

rectangular

discrete

functional

corresponding

defined

preset

desired

optim

alapproximate

inferior

feminine

funny

humorous

ironic

amusing

hilarious

thoughtful

passionate

energetic

lively

spirited

mem

orable

unforgettable

exciting

fascinating

interesting

intriguing

informative

entertaining

enjoyable

authentic

timeless

fashioned

quintessential

funky

retro

eclectic

minimalist

pleasing

inspiring

alluring

enchanting

mesmerizing

charming

delightful

quirky

endearing

earthy

soothing

invigorating

refreshing

wholesome

homey

tasteful

classy

creepy

cute

adorable

chatty

personable

approachable

mannered

talkative

gritty

neat

tidy

humble

unassuming

unpretentious

02

46

810

1214

expe

ctab

lepi

ckle

dco

rned

stoc

ked

stuf

fed

pack

aged

rum

alas

kan

salm

onle

ngth

wis

eco

arse

unco

oked

avoc

ado

shre

dded

chop

ped

peel

edro

tten fat

botto

mto

psto

pped

topp

ing

froz

endr

ied

stea

min

gw

hipp

ing

lefto

ver

stal

ebo

iled

sour

hung

ryst

arvi

ngex

haus

ted

dehy

drat

edca

ring

tend

ing

dist

ress

edst

ricke

nbl

ind

deaf

disa

bled

hom

eles

sel

derly

aged

eigh

teen fifty

nine

tyco

ntra

cted

trea

ted ill

preg

nant

sick

dyin

gco

njoi

ned

dom

inic

anre

turn

ing

depa

rted

stra

nded

resc

ued

mod

ultr

aep

onym

ous

hip

punk

secr

ethi

dden

dirt

yfa

kean

imal

pet

wild

mad

colo

red

oran

ge red

high

land

cres

cent

mid

wes

tern

rura

lun

popu

late

dun

deve

lope

dun

touc

hed

inac

cess

ible

woo

ded

hilly

spra

wlin

gpo

pula

ted

scru

bsh

allo

wro

cky

sand

yna

utic

allo

cosp

ortiv

etr

opic

alto

rren

tial

clou

dyda

mp

wet

hum

idra

iny

win

dych

illy

wea

ther

cold

war

mco

olho

urly

annu

alye

arly

gros

sto

tal

min

uspr

ojec

ted

surp

assi

ngw

hopp

ing

aver

age

adju

sted

il xxpr

epai

dm

aile

dpo

stal

mis

repr

esen

ted

infla

ted

skew

edun

foun

ded

mis

lead

ing

fals

edu

biou

squ

estio

nabl

ein

com

plet

ein

cons

iste

ntin

corr

ect

sket

chy

obsc

ure

afor

emen

tione

dpe

culia

rse

emin

gco

nfus

ing

ambi

guou

spr

oble

mat

icun

solic

ited

unau

thor

ised

forf

eit

outr

ight

hear

dlo

udye

lled

shou

ted

scre

amin

gcr

ying

blar

ing

bang

ing

attic

upst

airs

dow

nsta

irsaw

ake

asle

epw

akin

gaw

aken

edsl

eepi

ngbr

eath

ing

otc ci

wee

wan ta

nbu

mca

pita

list

orie

ntat

edlu

mbe

ring

impe

rialis

ticet

hnic

fore

ign

dom

estic

thai

indo

nesi

anko

rean

viet

nam

ese

chin

ese

mas

culin

ega

yab

usiv

eab

used

hara

ssed

com

mun

ist

polit

ical

liber

alpe

nal

man

dato

ryst

rict

forb

idde

npr

ohib

ited

viol

ated

rest

ricte

dlim

iting

slig

htdo

wnw

ard

dim

fade

dda

shed

disa

ppoi

ntin

gdi

smal

lack

lust

ersu

bdue

dup

beat

tepi

dst

ock

dipp

edro

se fell

drop

ping

falli

ngris

enfa

llen

low

ered

slas

hed

ster

ling

reco

verin

gre

cove

red

lost

beat

beat

enra

nked

four

thco

nsec

utiv

em

isse

dsh

ottie

dlo

nem

inut

efo

ulfr

ank

som

etim

em

idla

teag

ofir

eddi

smis

sed

sent

orde

red

suffi

cien

tla

ckin

gm

inim

alex

cess

ive

exce

ssw

aste

raw

scar

cew

orst

blam

ebl

amed

stru

gglin

gex

perie

nced

face

dhu

rtba

dw

orse

poor

wea

kaf

fect

edim

pact

edne

gativ

efa

vora

ble

dang

erou

sis

olat

edun

stab

leel

evat

edre

lativ

elo

wre

duce

dbu

shfe

dera

lin

tern

aldu

epr

evio

usfo

llow

ing

subs

eque

nthe

arin

gin

cide

ntsu

perio

rpo

tent

form

idab

lepr

obab

lepo

tent

ial

appa

rent

serio

usre

mot

ene

ighb

orin

gca

mp

near

near

bysc

ram

bled

rush

edru

shw

aitin

gst

oppe

dco

llect

trie

dtr

ying

deci

ded

wan

ted

atte

mpt

edfo

rced

unab

lere

mai

ning le

ftre

mov

edab

ando

ned

spre

adfo

und

cove

red

expo

sed

offe

nsiv

ede

fens

ive

unifo

rmed

loya

lar

med

fight

ing

pro

anti

oppo

sed

back

edex

plos

ive

unid

entif

ied

sear

chin

gsp

otte

dau

tom

atic

war

ning

flash

killi

ngst

abbi

ngex

ecut

edin

fam

ous

noto

rious

char

ged

guilt

ysu

spec

tcr

imin

alal

lege

dpe

ttydr

unke

ndr

unk

oper

ativ

ecu

tane

ous

faci

albo

nem

inor

sust

aine

dsu

fferin

gse

vere

bodi

lym

enta

lfo

ssil

war

min

gm

agne

tichy

drau

licam

bien

tflu

idliq

uid

synt

hetic

addi

tive

cont

ent

anal

ogru

nny

itchy

scra

tchy

edib

lein

edib

lefr

agra

ntpo

tted

rust

rust

ypo

wde

rym

ushy

grea

syfla

kyst

icky

wat

ery

hair

yho

llow

fuzz

yey

ed buff

redd

ish

blui

shda

rken

ing

opaq

uem

urky

okay

alrig

htde

arda

mn

rem

iss

napp

ing

seas

ick

pick

ybr

agbr

aggi

ngin

quiri

ngdi

scer

ning

trip

ping

lingu

albi

olum

ines

cent

equi

dist

ant

inte

rlace

dsc

hem

atic

deris

ive

audi

ble

buzz

ing

whi

sper

ing

upw

ind

bugg

yhu

ffyto

edho

oked

wire

dpl

ugge

dfa

bga

bby

little

rbe

dded

loos

elea

ftr

opic

pied

troj

anib

eria

nsm

okin

gse

cond

hand

addi

ctiv

eal

coho

licga

stro

inte

stin

alal

lerg

icas

thm

atic

sens

itive

touc

hyne

gotia

ble

oblig

ator

yov

erdu

eun

deci

ded

shy

incl

ined

wor

ried

conc

erne

dco

nfus

edun

happ

yfr

ustr

ated

hesi

tant

relu

ctan

tun

will

ing

eage

ran

xiou

ssk

eptic

alw

ary

adve

rtis

edbi

lled

book

edun

avai

labl

ere

sted

disp

osed

com

pens

ated

grad

edca

tego

rized

chec

ked

sort

edfa

rthe

stdi

stan

tap

proa

chin

gw

ande

ring

snea

king

touc

hing

boun

cing

cros

sed

halfw

ayst

uck

side

way

sdr

aggi

ngsl

idin

gdr

iven

rolli

ngro

lled

float

ing

drift

ing

tippe

dst

ray

disc

rimin

ator

ydi

scrim

inat

ing

intr

usiv

eex

plic

itun

safe

impr

actic

alun

usab

lepo

orly

inef

fect

ive

inad

equa

teco

nsum

ing

cost

lyre

petit

ive

need

less

unne

cess

ary

disc

oura

ged

appr

ehen

sive

leer

ym

umad

aman

tdi

stur

bed

horr

ified

disg

uste

dap

palle

ddi

smay

edhy

ster

ical

terr

ified

dist

raug

htst

ung

emba

rras

sed

anno

yed

irrita

ted

both

ered

com

plai

ning

thun

ders

truc

kgo

bsm

acke

dfla

bber

gast

edsp

eech

less

apol

oget

icin

cred

ulou

sbe

wild

ered

baffl

edpe

rple

xed

puzz

led

unim

pres

sed

dish

eart

ened

pert

urbe

dof

fend

eddi

slik

edda

zzle

den

amor

edpe

ster

edpa

tron

ized

lazy

kind

lyw

ise

jolly sly

insa

nein

com

pete

ntun

resp

onsi

vese

lfish

arro

gant

naiv

eig

nora

ntun

carin

gal

oof

forg

ivin

gsq

ueam

ish

fuss

yin

dulg

ent

obno

xiou

sov

erbe

arin

gco

cky

push

yup

tight

surly

sulle

ncl

uele

ssgr

umpy

lam

eru

dedo

wnr

ight

nast

yug

lyst

upid

silly

wei

rdcr

azy

inex

perie

nced

diso

rgan

ized

diso

rgan

ised

unim

agin

ativ

eun

mot

ivat

edin

diffe

rent

unin

tere

sted

nois

yro

wdy

unru

lydi

sgru

ntle

dira

tem

eagr

esi

zeab

leex

orbi

tant

exto

rtio

nate

over

pric

edte

mpt

ing

entic

ing

limitl

ess

lofty

man

agea

ble

unm

anag

eabl

ebe

arab

lein

tole

rabl

eun

bear

able

aggr

avat

ing

com

poun

ded

stre

ssfu

lpl

easu

rabl

edr

aini

ngta

xing

spoi

ltsp

oile

ddr

eary

mis

erab

lelo

usy

med

iocr

efe

eble

path

etic

pitif

ulhu

rrie

dfu

mbl

ing

clum

syha

phaz

ard

perf

unct

ory

halfh

eart

edw

impy

crap

pyto

uris

tygr

ouch

ycr

otch

ety

snoo

tysn

otty

smal

lish

grun

gysc

ruffy

cons

tipat

edth

ievi

nggr

ubby

dodg

yta

ttytip

sysk

anky

slot

hful

both

erso

me

naus

eatin

gyu

cky

lum

pyun

tidy

insu

bsta

ntia

lun

appe

alin

gun

attr

activ

era

unch

ytr

ashy

tast

eles

ssh

amel

ess

oste

ntat

ious

obtr

usiv

eim

pers

onal

unsu

ppor

tive

unro

man

ticun

invi

ting

repu

lsiv

ere

volti

ngst

ereo

type

dso

ulle

ssgl

orifi

edde

cade

ntun

savo

ryha

rmle

ssin

sign

ifica

ntim

agin

able

serv

icea

ble

char

acte

rless

flavo

rless

exis

tent

unm

entio

nabl

ebo

rder

line

atro

ciou

sde

plor

able

appa

lling

disg

ustin

gsh

amef

uldi

sgra

cefu

lin

excu

sabl

ere

pugn

ant

unde

sira

ble

disa

dvan

tage

ous

unhe

lpfu

lun

frie

ndly

unpr

ofes

sion

aldi

shon

est

inco

nsid

erat

eim

polit

edi

sres

pect

ful

mis

take

nig

nore

dco

ntra

ryju

stifi

edbe

fore

hand

real

ised

unde

rsta

ndab

leun

avoi

dabl

eun

fort

unat

eso

lved

corr

ecte

dre

ctifi

edflu

shcr

ack

crac

ked

inte

rfer

ing

oblig

edob

ligin

gre

vert

ing

bow

ing

syst

emat

iccl

eans

ing

irreg

ular

tran

sien

ten

dles

sco

untle

ssoc

casi

onal

spor

adic

evil

invi

sibl

em

yste

rious

shee

rut

ter

unbe

lieva

ble

nons

ense

outr

ageo

uslu

dicr

ous

absu

rdrid

icul

ous

frig

hten

ing

scar

ybi

zarr

eod

dst

rang

eee

riech

illin

gom

inou

sfu

turis

ticun

real

wor

ryin

gde

pres

sing

alar

min

gdi

stur

bing

trou

blin

gho

rrid

hide

ous

ghas

tlydr

eadf

ulho

rren

dous

horr

ific

horr

ifyin

gte

rrib

leho

rrib

leaw

ful

inco

nven

ient

usel

ess

poin

tless

borin

gte

diou

sdu

llbl

and

unpl

easa

ntun

com

fort

able

awkw

ard

unne

rvin

gdi

scon

cert

ing

anno

ying

irrita

ting

abru

ptun

expe

cted

sudd

enim

med

iate

swift

star

kbl

unt

shar

pex

trem

eha

rsh

drac

onia

ndr

astic

hum

iliat

ing

emba

rras

sing

shoc

king

host

ileun

acce

ptab

lein

appr

opria

tedi

sapp

oint

edsu

rpris

edsh

ocke

dst

unne

dan

gere

dan

gry

furio

usan

swer

ing

repe

ated

echo

ing

nerv

ous

tired

bore

dw

eary

impa

tient

sad

sorr

yas

ham

edup

setti

ngup

set

tric

kych

alle

ngin

gdi

fficu

ltto

ugh

figur

edw

onde

ring

notic

edse

eing

afra

idsc

ared

wan

ting

pret

end

desp

erat

eva

inun

luck

ytir

ing

frus

trat

ing

pack

edcr

owde

dja

mm

edfil

led

empt

ysu

rrou

nded

lined

fold

ing

woo

den

drap

edbe

arin

gpi

ctur

edfr

amed

stac

ked

stru

ngba

refo

otle

gged

bare

nake

dst

arin

gse

ated

sitti

ngdr

esse

dw

earin

gw

orn

dres

sbe

ige

skim

pyw

rinkl

edso

iled

rear

over

head

uppe

rou

ter

blan

kfli

pdi

agon

alsq

uare

dna

rrow

ing

gapi

ngum

brel

lam

attie

red

roof

edm

usty

mol

dysm

ellin

gpu

trid

fishy

hoke

ypr

eten

tious

tack

ych

eesy

spoo

kysn

eaky

groo

vysm

oky

boun

cym

ello

wm

elod

ious

bogg

yw

ater

logg

edso

ggy

faul

tless

pass

able

moo

nlit

driz

zly

com

fycu

shy

stuf

fycr

ampe

dcl

utte

red

swan

kyse

edy

dank

clau

stro

phob

icfil

thy

smel

lyth

read

bare

shab

bydi

ngy

grim

ysp

arta

ndr

abra

msh

ackl

ede

crep

itup

hill

grue

lling

tort

uous

slee

ples

she

ctic

fran

ticov

erpo

wer

ing

suffo

catin

gun

rele

ntin

gbl

azin

gle

isur

ely

bris

kun

even

tful

even

tful

unim

pres

sive iffy

unsa

tisfy

ing

unan

noun

ced

atte

ndan

tno

nsto

pun

ders

taffe

dla

xou

tdat

edcu

mbe

rsom

ein

effic

ient

inop

erab

leop

erab

leun

atte

nded

unlo

cked

vent

ilate

dm

ovea

ble

mov

able

mot

oriz

edst

atio

nary

uncl

ean

ster

ilest

erili

zed

wat

ered

scru

bbed

patc

hed

hydr

ated

toni

cso

undp

roof

hydr

opon

iccr

uste

dpr

oofe

dtr

eed

dead

ened

slop

ped

thud

ding

vert

igin

ous

extr

aord

inai

rem

anqu

esl

ippy

pisc

ine

bare

foot

edof

fish

laxa

tive

swis

hre

dux

span

king

man

iac

repe

llent

repe

llant

espe

cial

copi

ous

fray

edso

ured

mar

red

chao

ticte

nse

heat

edsp

litdo

min

ated

dest

roye

dda

mag

edru

ined

wre

cked

shut

shut

tere

din

terr

upte

den

ded

brok

esw

ept

ham

mer

edst

ruck

susp

ende

dca

ncel

led

dela

yed

thre

aten

edco

ntin

ued

rene

wed

clos

ecl

osed

bloc

ked

seal

edco

ntro

lled

regu

late

dre

serv

edve

sted

tem

pora

ryva

cant

com

mun

alun

occu

pied

occu

pied

rain

ing

spla

shed

soak

edw

ashe

dlit

tere

dsc

atte

red

burn

ing

burn

tbu

rned

apar

tbr

oken

blow

nth

row

npa

rked

smas

hed

smas

hing

pres

erve

dne

glec

ted

over

look

edfo

rgot

ten

sunk

enm

angl

edgr

ave

dead

burie

dto

rnw

rapp

edw

ound

shad

ycr

ooke

dpa

ved

win

ding

bum

py dirt

dust

ym

uddy

roug

hsl

ippe

ry icy

faul

tyov

erlo

aded

leak

ysp

otty

shod

dysu

bsta

ndar

dun

even

tight

slac

kst

eep

flat

narr

owde

nse

thic

kla

rge

heav

yun

derg

roun

dpe

dest

rian

ragi

ngro

arin

gru

mbl

ing

cong

este

dcl

ogge

dflo

oded

over

flow

ing

priz

eea

rned

rece

ived

mill

ion

wor

thva

lued

stre

amin

gel

ectr

onic

digi

tal

inst

ant

onlin

ein

tera

ctiv

ebr

oadb

and

wire

less

mob

ilepo

rtab

leau

tom

ated

man

ager

ial

tech

nica

lfin

anci

alco

rpor

ate

prof

itabl

eco

mpe

titiv

eov

erse

asco

mm

erci

aldi

rect

stan

dby

bene

ficia

ryto

ken

supp

lem

enta

rysu

pple

men

tal

gene

ricpr

escr

iptio

nun

limite

dm

inim

umm

axim

umpr

emiu

mre

ntal

affo

rdab

leex

pens

ive

inex

pens

ive

chea

pco

nven

ient

acce

ssib

lesu

itabl

eid

eal

attr

activ

ede

sira

ble

rege

ntm

erid

ian

inte

rcon

tinen

tal

med

iterr

anea

nco

ntin

enta

lpa

cific

atla

ntic jet

plan

ela

nded

flyin

g flyin

trep

idch

arte

red

guid

edtr

acke

dsi

ghte

dex

pres

sou

tbou

ndna

tura

lm

iner

alor

ient

aloc

cide

ntal

elec

tric

alm

echa

nica

lm

otor

elec

tric

disc

over

edde

tect

edar

tific

ial

surf

ace

equi

pped

fitte

dm

ount

edpo

wer

edbu

iltde

sign

edm

odel

inte

grat

edde

velo

ped

oper

atio

nal

oper

atin

gdu

alfix

edex

istin

gco

nven

tiona

lad

vanc

ead

vanc

edin

crea

sed

expa

nded

impr

oved

enha

nced

mai

ntai

ned

limite

dex

tend

edin

sure

dpr

uden

tial

neut

ral

pola

rne

tpo

sted

leve

lhi

ghhi

gher

rais

edex

pect

edan

ticip

ated

low

est

over

all

com

bine

dre

vise

dpr

oces

sed

impo

rted

avai

labl

edi

strib

uted

cont

aine

dm

ater

ial

docu

men

ted

colle

cted

radi

oni

ghtly

daily

wee

kly

med

ium

tiny

smal

lla

rger

smal

ler

size

size

dvi

sibl

ech

arac

teris

ticdi

stin

ctiv

eun

ique

dist

inct

obvi

ous

surp

risin

gno

ticea

ble

subt

lest

rikin

gdr

amat

icle

sser

myr

iad

asso

rted

rang

ing

varie

dva

ryin

gtr

emen

dous

cons

ider

able

grea

ter

subs

tant

ial

sign

ifica

ntm

assi

veva

sthu

geen

orm

ous

wid

ede

epin

ner

rich

dive

rse

mis

sing

aliv

ecl

eare

dhe

aded

load

eddr

ivin

gw

alki

ngca

tchi

ngru

nnin

gsh

ort

long

right

free

mid

dle

hom

eha

lfpa

sttu

rned

away

back cu

tcu

tting

easy

quic

ksl

ow fast

pass

pass

ing

forw

ard

ahea

dbe

hind

pres

sed

pres

sing

mee

tpr

epar

edre

ady

mus

tw

illin

gno

ted

desc

ribed

ackn

owle

dged

stat

edpo

inte

dst

ress

edle

arne

dun

ders

tood

real

ized

telli

ngad

vise

daw

are

info

rmed

prov

edde

term

ined

neith

erco

nvin

ced

clea

rim

poss

ible

othe

rwis

ede

cidi

nglik

ely

unlik

ely

poss

ible

futu

rech

ance

enou

ghab

lest

illke

ptal

one

done

sure

goin

g goco

min

ggo

ne real big

bigg

erpr

etty

little

kind lik

eha

rdex

celle

ntqu

ality

best

good

bette

rfa

irfin

epr

otec

ted

prot

ectin

gcl

ean

safe

secu

recr

itica

lke

yim

port

ant

vita

lcr

ucia

lm

ain

maj

orle

adin

gac

tive

invo

lved

resp

onsi

ble

beco

min

gtr

ansf

orm

edas

sum

edas

sum

ing

pres

ent

curr

ent

mer

eze

roco

mpa

rabl

eeq

uiva

lent

chan

ged

chan

ging

sim

ilar

unus

ual

subj

ect

latte

run

like

norm

alty

pica

lus

ual

ever

yen

tire

who

lesu

ppos

edin

tend

edre

quire

dne

eded

nece

ssar

yus

edap

plie

dco

nnec

ted

sepa

rate

mul

tiple

indi

vidu

alre

late

dco

mm

ondi

ffere

ntsp

ecifi

cpa

rtic

ular

cert

ain

med

ical

patie

ntel

emen

tary

text

book

bilin

gual

expe

rimen

tal

mod

ern

cont

empo

rary

alte

rnat

ive

choi

cepr

efer

red

basi

cst

anda

rdtr

aditi

onal

cust

omre

side

nthi

red

retir

edtr

aine

dpr

ofes

sion

alam

ateu

rel

igib

lequ

alifi

edpr

ospe

ctiv

ese

lect

edse

lect

assi

gned

rank top

rank

ing

juni

orel

itere

cogn

ised

resp

ecte

dfo

rem

ost

outs

tand

ing

exem

plar

ycr

eativ

ear

tistic

wor

thy

ultim

ate

extr

aord

inar

yex

cept

iona

len

durin

gge

nuin

efa

mou

sle

gend

ary

grea

tgr

eate

stho

nore

dde

dica

ted

abso

lute

univ

ersa

lor

dina

ryev

eryd

aypu

resa

tisfy

ing

sim

ple

stra

ight

forw

ard

self

cons

ciou

spe

rson

alph

ysic

alhi

span

icam

eric

anna

tive

latin

asia

naf

rican

side

cros

sop

posi

teou

tsid

ein

side

fron

tst

andi

ngpr

ior

initi

alla

stea

rlier

early

late

rco

mpl

eted se

tfu

llco

mpl

ete

hist

oric

mar

ked

cele

brat

edco

nver

ted

settl

edre

loca

ted

inco

rpor

ated

esta

blis

hed

live

livin

gne

wne

xtbe

ginn

ing

star

ting

mov

edm

ovin

gw

orki

ngsp

ent

less fa

rev

enm

uch

wel

lm

ade

just

one

anot

her

trav

elle

dtr

avel

edat

tend

edga

ther

edfe

wer

leas

tno

neei

ght

nine

seve

nfiv

esi

xtw

ofo

urth

ree

twen

tyth

irty

ten

twel

vefif

teen

elev

endo

zen

hund

red

divi

ded

toge

ther

arra

nged

jew

ish

affil

iate

dfo

rmed

join

edna

tionw

ide

coor

dina

ted

orga

nise

dor

gani

zed

said

acco

rdin

gof

ficia

lco

nfir

med

repo

rted

natio

nal

euro

pean

inte

rnat

iona

lun

ion

unite

dap

poin

ted

elec

ted

repr

esen

tativ

ere

pres

ente

dse

nior

asso

ciat

eas

sist

ant

exec

utiv

ege

nera

lsc

hedu

led

plan

ned

augu

stop

ened

held

sign

sign

edap

prov

edag

reed

anno

unce

djo

int

initi

ativ

epr

esid

entia

lpr

emie

rpr

ime

indi

anca

nadi

anbr

itish

aust

ralia

nen

glis

hiri

shro

yal

colo

nial

vict

oria

nne

wbo

rnte

enag

ete

enag

edad

ult

fem

ale

mal

eyo

ung

olde

ryo

unge

rol

dbo

rnm

arrie

dm

exic

ancu

ban

italia

nbr

azili

ansp

anis

hpo

rtug

uese

japa

nese

gree

ktu

rkis

hgi

ant

owne

dso

ldba

sed

firm

boss ex

form

ersw

iss

fren

chbe

lgia

ndu

tch

dani

shge

rman

polis

h iiro

man

otto

man

prus

sian

met

ropo

litan

indu

stria

lre

side

ntia

lur

ban

capi

tal

nort

heas

tso

uthe

ast

nort

hso

uth

wes

tea

stce

ntra

lso

uthe

rnno

rthe

rnw

este

rnea

ster

nte

rmin

alpo

rtba

yba

sebu

sybu

stlin

gdo

wnt

own

upto

wn

plai

nbo

rder

edov

erlo

okin

gsu

rrou

ndin

gad

jace

ntsq

uare

loca

ted

situ

ated

peac

eful

mas

sex

pres

sed

voic

edfo

rthc

omin

gup

com

ing

rece

ntla

test

ongo

ing

cont

inui

ngci

vic

orie

nted

min

ded

frie

ndly

coop

erat

ive

unde

rsta

ndin

gsh

ared

shar

ing

dipl

omat

icst

rate

gic

hist

oric

alcu

ltura

lm

any

vario

usse

vera

lnu

mer

ous

popu

lar

know

nco

nsid

ered

loca

lpr

ivat

epu

blic

inde

pend

ent

lega

lce

nter

com

plex

hum

anso

cial

envi

ronm

enta

llo

ose

thin

soft

smoo

thtr

ansp

aren

tre

laxi

ngre

laxe

dac

com

mod

atin

gco

nditi

oned

fest

ive

invi

ting

wel

com

ing

wel

com

ecu

stom

ary

rout

ine

casu

alin

timat

eco

rrec

tw

rong

acce

ptab

lere

ason

able

sens

ible

prop

erap

prop

riate

prac

tical

obje

ctiv

ew

hats

oeve

rtr

uem

eani

ngre

gard

less

mea

nw

hate

ver

know

ing

proo

fde

finite

gear

edm

otiv

ated

enga

ged

enga

ging

accu

stom

edal

ike

look

ing

thin

king

inte

rest

edfo

cuse

dre

liabl

eac

cura

teca

refu

lth

orou

ghtim

ely

prom

ptsp

eedy

intu

itive

agile

econ

omic

alsu

ited

utili

zed

apt

pref

erab

lekn

owle

dgea

ble

repu

tabl

ere

spon

sive

hone

stco

mpe

tent

grow

ing

grow

nfr

esh

mix

edm

ildm

oder

ate

posi

tive

stro

ngco

nsis

tent

solid

caut

ious

mod

est

stea

dyov

erni

ght

quie

tca

lmsi

gnal

aler

tel

abor

ate

deta

iled

exte

nsiv

ebr

ief

leng

thy

form

alin

form

alfr

eque

ntco

nsta

ntpl

usex

tra

addi

tiona

lre

gula

rsp

ecia

lac

com

pani

edac

com

pany

ing

plac

edat

tach

edsu

mm

ary

supr

eme

unan

imou

sha

ndle

dad

dres

sed

reso

lved

fulfi

lled

relie

ved

assu

red

conf

iden

tpl

ease

dsa

tisfie

dem

pow

ered

supe

rvis

edas

sist

edpr

omis

ing

appe

alin

gde

man

ding

acce

pted

acce

ptin

gre

vers

ere

vers

eddr

awn

hand

edta

king

take

ngi

ven

givi

ngex

clus

ive

sole

gran

ted

requ

este

dau

thor

ized

perm

anen

tpa

rtia

lda

ted

liste

dun

know

nex

act

actu

alve

rifie

dco

nfir

min

gdi

sclo

sed

topn

otch

mod

erne

met

aliro

nco

ncre

test

one

votiv

eill

umin

ated lit

light

edca

rved

scul

pted

pain

ted

deco

rate

dad

orne

dha

ndm

ade

antiq

uest

aine

dpa

stel

arch

arch

edtil

edgl

azed

pane

led

orna

tedo

med

imm

acul

ate

cath

edra

lou

twar

din

land

fart

her

roun

ded

shel

tere

dtw

in tall

laid lay

nest

led

tuck

edsh

aped

etch

edsa

cred

myt

hica

lhe

aven

lydi

vine

mar

ian

elde

rno

ble

holly

woo

dst

arsw

ank

ruby

shin

ing

stel

lar

glow

ing

fain

tby

gone

bust

garg

antu

angi

gant

icpr

ecio

uspr

icel

ess

rare

exot

icru

stic

quai

ntpi

ctur

esqu

ese

clud

edid

yllic

posh

upsc

ale

upm

arke

tpa

risia

nch

icfa

shio

nabl

etr

endy

plus

hlu

xepa

latia

llu

xurio

usop

ulen

tsu

mpt

uous

scen

icpa

nora

mic

suns

etsu

nris

epl

easa

ntbr

eezy

sunn

ygo

thic

baro

que

arch

itect

ural

deco

rativ

ein

terio

rex

terio

rdo

wnh

illqu

adru

ple

doub

letr

iple

swim

min

gin

door

outd

oor

wor

ldol

ympi

cgo

lden

gold

silv

erw

onw

inni

ngro

und

final

finis

hed

stra

ight

open

gran

din

stru

men

tal

solo

pian

oba

ssso

und

fam

iliar

soun

ding

sing

ing

pop

mas

ter

mag

icm

agic

altw

entie

thin

spire

dre

min

isce

ntm

irror

edre

flect

edev

iden

tro

man

ticro

man

cead

apte

dno

vel

dire

cted

dire

ctin

gen

title

dw

ritte

nor

igin

alin

clud

edfe

atur

edst

raw

oval

iden

tical

mat

chin

gm

atch

edun

iform fit

fittin

gco

lor

light

dark

brig

htbl

ack

whi

tebr

own

gray

grey

yello

wpi

nkbl

uegr

een

curt

pitc

hed

play

edga

me

sing

lefir

stse

cond

third

elev

enth

nint

hfif

thsi

xth

seve

nth

capp

edtim

edop

enin

gcl

osin

gst

retc

hm

idw

ayam

icab

leco

rdia

lut

mos

tsi

ncer

elo

ving

right

eous

virt

uous

cons

cien

tious

com

pass

iona

tehe

arty

luke

war

mgr

acio

usho

spita

ble

resp

ectfu

lco

urte

ous

polit

edi

ligen

tat

tent

ive

solic

itous

cons

ider

ate

sani

tary

blan

ket

prot

ectiv

ebo

tani

cal

land

scap

edle

afy

shad

edae

robi

cdr

inka

ble

usab

lepl

entif

ulab

unda

ntad

equa

team

ple

satis

fact

ory

wor

kabl

epa

inle

ssto

lera

ble

wor

thw

hile

doab

lein

clus

ive

cond

uciv

eco

mpl

emen

tary

bene

ficia

lfle

xibl

eef

fect

ive

effic

ient cc

ster

eode

luxe

com

pact cd

adju

stab

letr

imop

tiona

lgr

oom

edw

eded

ucat

edw

ealth

yhe

alth

yhe

alth

ier

vibr

ant

cosm

opol

itan

tole

rant

civi

lized

win

dwar

dvi

rgin

cora

lca

nary

tusc

anve

netia

nne

apol

itan

flore

ntin

eja

mai

can

haw

aiia

ncr

eole

nigh

galo

repr

esto

spel

lbou

ndpr

icy

hom

ely

scru

mpt

ious

appe

tizin

gal

fres

copa

mpe

ring

uncr

owde

dun

clut

tere

dun

dist

urbe

dsp

arse

rest

ful

seda

teun

obtr

usiv

edi

scre

etse

rene

tran

quil

joyf

ulch

eerf

ulch

eery

ency

clop

edic

boun

dles

sun

para

llele

dun

surp

asse

dun

mat

ched

unbe

atab

lefla

wle

ssun

ders

tate

def

fort

less

impe

ccab

lesu

perla

tive

fluen

tsp

eaki

ngsp

oken

capa

ble

pow

erfu

lag

gres

sive

coun

ter

smar

tcl

ever

soph

istic

ated

inno

vativ

esl

imta

me

mar

gina

lm

eage

rhe

ftysi

zabl

eva

luab

lein

valu

able

hand

yhe

lpfu

lus

eful

spar

ew

aste

dsa

ved

savi

ngge

nero

uspa

idpa

ying

brav

ebo

ldfa

bulo

usfa

ntas

ticw

onde

rful

terr

ific

phen

omen

alas

toni

shin

gam

azin

gin

cred

ible

awes

ome

mar

velo

usm

arve

llous

glor

ious

maj

estic

mag

nific

ent

sple

ndid

spec

tacu

lar

stun

ning

brea

thta

king

dazz

ling

supe

rbbr

illia

ntim

pres

sive

rem

arka

ble

dese

rved

dece

ntre

spec

tabl

eha

ppy ok

com

fort

able

nice

perf

ect

seas

oned

acco

mpl

ishe

dta

lent

edap

prec

iativ

een

thus

iast

icim

pres

sed

appr

ecia

ted

fanc

ied

ente

rtai

ned

ador

edke

enav

idcu

rious

exci

ted

amaz

edde

light

edth

rille

dlo

ved

liked

fort

unat

elu

cky

grat

eful

glad

than

kful

hot

dry

salt

baki

ngce

real

crac

kers

pean

utflu

ffyva

nilla

cara

mel

blen

ded

pow

dere

dnu

tsto

aste

dst

eam

edro

ast

chic

ken

frie

dba

ked

grill

edco

oked

roas

ted

oliv

em

int

cher

ryho

ney

ripe

juic

yte

nder

swee

tsa

ltysa

vory

tast

yde

licio

usda

rling pa

tac

efa

vour

itefa

vorit

eou

tbac

ksu

per

min

icl

assi

cho

mem

ade

take

out

com

plim

enta

rype

rson

aliz

edin

flata

ble

plas

ticru

bber

disp

osab

lecu

shio

ned

wat

erpr

oof

rem

ovab

lead

agio

yum

my

velv

etin

digo

fros

ted

quilt

edto

ples

snu

dela

ughi

ngsm

iling

petit

epo

lishe

dex

quis

iteel

egan

tgr

acef

ulcr

isp

spar

klin

gbu

bbly

lush

pris

tine

colo

rful

beau

tiful

gorg

eous

love

lyta

nned

refr

eshe

dsp

otle

sssq

ueak

yro

omy

snug

cozy

cosy

glos

syst

ylis

hsl

eek

flash

ygl

amor

ous

pric

eyfa

ncy

skin

nyov

ersi

zed

shin

ysp

arkl

yfin

erflo

wer

yle

gibl

eai

ryex

pans

ive

encl

osed

furn

ishe

dsp

acio

usin

terc

onne

cted

augm

ente

dco

nfig

ured

ellip

tical

circ

ular

rect

angu

lar

disc

rete

func

tiona

lco

rres

pond

ing

defin

edpr

eset

desi

red

optim

alap

prox

imat

ein

ferio

rfe

min

ine

funn

yhu

mor

ous

ironi

cam

usin

ghi

lario

usth

ough

tful

pass

iona

teen

erge

ticliv

ely

spiri

ted

mem

orab

leun

forg

etta

ble

exci

ting

fasc

inat

ing

inte

rest

ing

intr

igui

ngin

form

ativ

een

tert

aini

ngen

joya

ble

auth

entic

timel

ess

fash

ione

dqu

inte

ssen

tial

funk

yre

tro

ecle

ctic

min

imal

ist

plea

sing

insp

iring

allu

ring

ench

antin

gm

esm

eriz

ing

char

min

gde

light

ful

quirk

yen

dear

ing

eart

hyso

othi

ngin

vigo

ratin

gre

fres

hing

who

leso

me

hom

eyta

stef

ulcl

assy

cree

pycu

tead

orab

lech

atty

pers

onab

leap

proa

chab

lem

anne

red

talk

ativ

egr

itty

neat

tidy

hum

ble

unas

sum

ing

unpr

eten

tious

02

46

810

1214

expectable

pickled

corned

stocked

stuffed

packaged rum

alaskan

salmon

lengthwise

coarse

uncooked

avocado

shredded

chopped

peeled

rotten fat

bottom

tops

topped

topping

frozen

dried

steaming

whipping

leftover

stale

boiled

sour

hungry

starving

exhausted

dehydrated

caring

tending

distressed

stricken

blind

deaf

disabled

homeless

elderly

aged

eighteen fifty

ninety

contracted

treated ill

pregnant

sick

dying

conjoined

dominican

returning

departed

stranded

rescued

mod

ultra

eponym

ous

hip

punk

secret

hidden

dirty fake

animal pet

wild

mad

colored

orange red

highland

crescent

midwe

stern

rural

unpopulated

undeveloped

untouched

inaccessible

wooded hilly

sprawling

populated

scrub

shallow

rocky

sandy

nautical

loco

sportive

tropical

torrential

cloudy

damp

wet

humid

rainy

windy

chilly

weather

cold

warm cool

hourly

annual

yearly

gross

total

minus

projected

surpassing

whopping

average

adjusted il xx

prepaid

mailed

postal

misrepresented

inflated

skewed

unfounded

misleading

false

dubious

questionable

incomplete

inconsistent

incorrect

sketchy

obscure

aforem

entioned

peculiar

seem

ing

confusing

ambiguous

problematic

unsolicited

unauthorised

forfeit

outright

heard

loud

yelled

shouted

scream

ing

crying

blaring

banging

attic

upstairs

downstairs

awake

asleep

waking

awakened

sleeping

breathing

otc ci

wee

wan

tan

bum

capitalist

orientated

lumbering

imperialistic

ethnic

foreign

domestic thai

indonesian

korean

vietnamese

chinese

masculine

gay

abusive

abused

harassed

communist

political

liberal

penal

mandatory

strict

forbidden

prohibited

violated

restricted

limiting

slight

downward

dim

faded

dashed

disappointing

dism

allackluster

subdued

upbeat

tepid

stock

dipped

rose fell

dropping

falling

risen

fallen

lowe

red

slashed

sterling

recovering

recovered

lost

beat

beaten

ranked

fourth

consecutive

missed

shot

tied

lone

minute

foul

frank

sometime

mid

late

ago

fired

dism

issed

sent

ordered

sufficient

lacking

minimal

excessive

excess

waste

raw

scarce

worst

blam

eblam

edstruggling

experienced

faced

hurt

bad

worse

poor

weak

affected

impacted

negative

favorable

dangerous

isolated

unstable

elevated

relative

low

reduced

bush

federal

internal

due

previous

following

subsequent

hearing

incident

superior

potent

formidable

probable

potential

apparent

serious

remote

neighboring

camp

near

nearby

scrambled

rushed

rush

waiting

stopped

collect

tried

trying

decided

wanted

attempted

forced

unable

remaining left

removed

abandoned

spread

found

covered

exposed

offensive

defensive

uniformed

loyal

armed

fighting

pro

anti

opposed

backed

explosive

unidentified

searching

spotted

automatic

warning

flash

killing

stabbing

executed

infamous

notorious

charged

guilty

suspect

criminal

alleged

petty

drunken

drunk

operative

cutaneous

facial

bone

minor

sustained

suffering

severe

bodily

mental

fossil

warming

magnetic

hydraulic

ambient

fluid

liquid

synthetic

additive

content

analog

runny

itchy

scratchy

edible

inedible

fragrant

potted

rust

rusty

powdery

mushy

greasy

flaky

sticky

watery

hairy

hollow

fuzzy

eyed buff

reddish

bluish

darkening

opaque

murky

okay

alright

dear

damn

remiss

napping

seasick

picky

brag

bragging

inquiring

discerning

tripping

lingual

biolum

inescent

equidistant

interlaced

schematic

derisive

audible

buzzing

whispering

upwind

buggy

huffy

toed

hooked

wired

plugged

fab

gabby

littler

bedded

looseleaf

tropic

pied

trojan

iberian

smoking

secondhand

addictive

alcoholic

gastrointestinal

allergic

asthmatic

sensitive

touchy

negotiable

obligatory

overdue

undecided

shy

inclined

worried

concerned

confused

unhappy

frustrated

hesitant

reluctant

unwilling

eager

anxious

skeptical

wary

advertised

billed

booked

unavailable

rested

disposed

compensated

graded

categorized

checked

sorted

farthest

distant

approaching

wandering

sneaking

touching

bouncing

crossed

halfway

stuck

sideways

dragging

sliding

driven

rolling

rolled

floating

drifting

tipped

stray

discrim

inatory

discrim

inating

intrusive

explicit

unsafe

impractical

unusable

poorly

ineffective

inadequate

consum

ing

costly

repetitive

needless

unnecessary

discouraged

apprehensive

leery

mum

adam

ant

disturbed

horrified

disgusted

appalled

dism

ayed

hysterical

terrified

distraught

stung

embarrassed

annoyed

irritated

bothered

complaining

thunderstruck

gobsmacked

flabbergasted

speechless

apologetic

incredulous

bewildered

baffled

perplexed

puzzled

unimpressed

disheartened

perturbed

offended

disliked

dazzled

enam

ored

pestered

patronized

lazy

kindly

wise

jolly sly

insane

incompetent

unresponsive

selfish

arrogant

naive

ignorant

uncaring

aloof

forgiving

squeam

ish

fussy

indulgent

obnoxious

overbearing

cocky

pushy

uptight

surly

sullen

clueless

grum

pylame

rude

downright

nasty

ugly

stupid

silly

weird

crazy

inexperienced

disorganized

disorganised

unimaginative

unmotivated

indifferent

uninterested

noisy

rowdy

unruly

disgruntled

irate

meagre

sizeable

exorbitant

extortionate

overpriced

tempting

enticing

limitless

lofty

manageable

unmanageable

bearable

intolerable

unbearable

aggravating

compounded

stressful

pleasurable

draining

taxing

spoilt

spoiled

dreary

miserable

lousy

mediocre

feeble

pathetic

pitiful

hurried

fumbling

clum

syhaphazard

perfunctory

halfhearted

wimpy

crappy

touristy

grouchy

crotchety

snooty

snotty

smallish

grungy

scruffy

constipated

thieving

grubby

dodgy

tatty

tipsy

skanky

slothful

bothersome

nauseating

yucky

lumpy

untidy

insubstantial

unappealing

unattractive

raunchy

trashy

tasteless

sham

eless

ostentatious

obtrusive

impersonal

unsupportive

unromantic

uninviting

repulsive

revolting

stereotyped

soulless

glorified

decadent

unsavory

harmless

insignificant

imaginable

serviceable

characterless

flavorless

existent

unmentionable

borderline

atrocious

deplorable

appalling

disgusting

sham

eful

disgraceful

inexcusable

repugnant

undesirable

disadvantageous

unhelpful

unfriendly

unprofessional

dishonest

inconsiderate

impolite

disrespectful

mistaken

ignored

contrary

justified

beforehand

realised

understandable

unavoidable

unfortunate

solved

corrected

rectified

flush

crack

cracked

interfering

obliged

obliging

reverting

bowing

system

atic

cleansing

irregular

transient

endless

countless

occasional

sporadic

evil

invisible

mysterious

sheer

utter

unbelievable

nonsense

outrageous

ludicrous

absurd

ridiculous

frightening

scary

bizarre odd

strange

eerie

chilling

ominous

futuristic

unreal

worrying

depressing

alarming

disturbing

troubling

horrid

hideous

ghastly

dreadful

horrendous

horrific

horrifying

terrible

horrible

awful

inconvenient

useless

pointless

boring

tedious

dull

bland

unpleasant

uncomfortable

awkward

unnerving

disconcerting

annoying

irritating

abrupt

unexpected

sudden

immediate

swift

stark

blunt

sharp

extreme

harsh

draconian

drastic

humiliating

embarrassing

shocking

hostile

unacceptable

inappropriate

disappointed

surprised

shocked

stunned

angered

angry

furious

answering

repeated

echoing

nervous

tired

bored

weary

impatient

sad

sorry

ashamed

upsetting

upset

tricky

challenging

difficult

tough

figured

wondering

noticed

seeing

afraid

scared

wanting

pretend

desperate

vain

unlucky

tiring

frustrating

packed

crowded

jammed

filled

empty

surrounded

lined

folding

wooden

draped

bearing

pictured

framed

stacked

strung

barefoot

legged

bare

naked

staring

seated

sitting

dressed

wearing

worn

dress

beige

skimpy

wrinkled

soiled

rear

overhead

upper

outer

blank

flip

diagonal

squared

narrowing

gaping

umbrella

mat

tiered

roofed

musty

moldy

smelling

putrid

fishy

hokey

pretentious

tacky

cheesy

spooky

sneaky

groovy

smoky

bouncy

mellow

melodious

boggy

waterlogged

soggy

faultless

passable

moonlit

drizzly

comfy

cushy

stuffy

cram

ped

cluttered

swanky

seedy

dank

claustrophobic

filthy

smelly

threadbare

shabby

dingy

grimy

spartan

drab

ramshackle

decrepit

uphill

gruelling

tortuous

sleepless

hectic

frantic

overpowe

ring

suffocating

unrelenting

blazing

leisurely

brisk

uneventful

eventful

unimpressive iffy

unsatisfying

unannounced

attendant

nonstop

understaffed

lax

outdated

cumbersom

einefficient

inoperable

operable

unattended

unlocked

ventilated

moveable

movable

motorized

stationary

unclean

sterile

sterilized

watered

scrubbed

patched

hydrated

tonic

soundproof

hydroponic

crusted

proofed

treed

deadened

slopped

thudding

vertiginous

extraordinaire

manque

slippy

piscine

barefooted

offish

laxative

swish

redux

spanking

maniac

repellent

repellant

especial

copious

frayed

soured

marred

chaotic

tense

heated split

dominated

destroyed

damaged

ruined

wrecked

shut

shuttered

interrupted

ended

broke

swept

hammered

struck

suspended

cancelled

delayed

threatened

continued

renewe

dclose

closed

blocked

sealed

controlled

regulated

reserved

vested

temporary

vacant

communal

unoccupied

occupied

raining

splashed

soaked

washed

littered

scattered

burning

burnt

burned

apart

broken

blown

thrown

parked

smashed

smashing

preserved

neglected

overlooked

forgotten

sunken

mangled

grave

dead

buried

torn

wrapped

wound

shady

crooked

paved

winding

bumpy dirt

dusty

muddy

rough

slippery icy

faulty

overloaded

leaky

spotty

shoddy

substandard

uneven

tight

slack

steep

flat

narrow

dense

thick

large

heavy

underground

pedestrian

raging

roaring

rumbling

congested

clogged

flooded

overflowing

prize

earned

received

million

worth

valued

stream

ing

electronic

digital

instant

online

interactive

broadband

wireless

mobile

portable

automated

managerial

technical

financial

corporate

profitable

competitive

overseas

commercial

direct

standby

beneficiary

token

supplementary

supplemental

generic

prescription

unlim

ited

minimum

maximum

prem

ium

rental

affordable

expensive

inexpensive

cheap

convenient

accessible

suitable

ideal

attractive

desirable

regent

meridian

intercontinental

mediterranean

continental

pacific

atlantic jet

plane

landed

flying fly

intrepid

chartered

guided

tracked

sighted

express

outbound

natural

mineral

oriental

occidental

electrical

mechanical

motor

electric

discovered

detected

artificial

surface

equipped

fitted

mounted

powe

red

built

designed

model

integrated

developed

operational

operating

dual

fixed

existing

conventional

advance

advanced

increased

expanded

improved

enhanced

maintained

limited

extended

insured

prudential

neutral

polar

net

posted

level

high

higher

raised

expected

anticipated

lowe

stoverall

combined

revised

processed

imported

available

distributed

contained

material

documented

collected

radio

nightly

daily

weekly

medium tiny

small

larger

smaller

size

sized

visible

characteristic

distinctive

unique

distinct

obvious

surprising

noticeable

subtle

striking

dram

atic

lesser

myriad

assorted

ranging

varied

varying

tremendous

considerable

greater

substantial

significant

massive

vast

huge

enormous

wide

deep

inner

rich

diverse

missing

alive

cleared

headed

loaded

driving

walking

catching

running

short

long

right

free

middle

home

half

past

turned

away

back cut

cutting

easy

quick

slow fast

pass

passing

forward

ahead

behind

pressed

pressing

meet

prepared

ready

must

willing

noted

described

acknowledged

stated

pointed

stressed

learned

understood

realized

telling

advised

aware

informed

proved

determined

neither

convinced

clear

impossible

otherwise

deciding

likely

unlikely

possible

future

chance

enough

able still

kept

alone

done

sure

going go

coming

gone real big

bigger

pretty

little

kind like

hard

excellent

quality

best

good

better

fair

fine

protected

protecting

clean

safe

secure

critical

key

important

vital

crucial

main

major

leading

active

involved

responsible

becoming

transformed

assumed

assuming

present

current

mere

zero

comparable

equivalent

changed

changing

similar

unusual

subject

latter

unlike

normal

typical

usual

every

entire

whole

supposed

intended

required

needed

necessary

used

applied

connected

separate

multiple

individual

related

common

different

specific

particular

certain

medical

patient

elem

entary

textbook

bilingual

experim

ental

modern

contem

porary

alternative

choice

preferred

basic

standard

traditional

custom

resident

hired

retired

trained

professional

amateur

eligible

qualified

prospective

selected

select

assigned

rank top

ranking

junior

elite

recognised

respected

foremost

outstanding

exem

plary

creative

artistic

worthy

ultim

ate

extraordinary

exceptional

enduring

genuine

famous

legendary

great

greatest

honored

dedicated

absolute

universal

ordinary

everyday

pure

satisfying

simple

straightforward

self

conscious

personal

physical

hispanic

american

native

latin

asian

african

side

cross

opposite

outside

inside

front

standing

prior

initial

last

earlier

early

later

completed set

full

complete

historic

marked

celebrated

converted

settled

relocated

incorporated

established

live

living

new

next

beginning

starting

moved

moving

working

spent

less far

even

much

well

made

just

one

another

travelled

traveled

attended

gathered

fewe

rleast

none

eight

nine

seven

five six

two

four

three

twenty

thirty

ten

twelve

fifteen

eleven

dozen

hundred

divided

together

arranged

jewish

affiliated

formed

joined

nationw

ide

coordinated

organised

organized

said

according

official

confirm

edreported

national

european

international

union

united

appointed

elected

representative

represented

senior

associate

assistant

executive

general

scheduled

planned

august

opened

held

sign

signed

approved

agreed

announced

joint

initiative

presidential

prem

ier

prime

indian

canadian

british

australian

english

irish

royal

colonial

victorian

newborn

teenage

teenaged

adult

female

male

young

older

younger

old

born

married

mexican

cuban

italian

brazilian

spanish

portuguese

japanese

greek

turkish

giant

owned

sold

based

firm

boss ex

former

swiss

french

belgian

dutch

danish

german

polish ii

roman

ottoman

prussian

metropolitan

industrial

residential

urban

capital

northeast

southeast

north

south

west

east

central

southern

northern

western

eastern

terminal

port

bay

base

busy

bustling

downtown

uptown

plain

bordered

overlooking

surrounding

adjacent

square

located

situated

peaceful

mass

expressed

voiced

forthcoming

upcoming

recent

latest

ongoing

continuing

civic

oriented

minded

friendly

cooperative

understanding

shared

sharing

diplom

atic

strategic

historical

cultural

many

various

several

numerous

popular

known

considered

local

private

public

independent

legal

center

complex

human

social

environm

ental

loose

thin

soft

smooth

transparent

relaxing

relaxed

accommodating

conditioned

festive

inviting

welcom

ing

welcom

ecustom

ary

routine

casual

intim

ate

correct

wrong

acceptable

reasonable

sensible

proper

appropriate

practical

objective

whatsoever

true

meaning

regardless

mean

whatever

knowing

proof

definite

geared

motivated

engaged

engaging

accustom

edalike

looking

thinking

interested

focused

reliable

accurate

careful

thorough

timely

prom

ptspeedy

intuitive

agile

econom

ical

suited

utilized apt

preferable

knowledgeable

reputable

responsive

honest

competent

growing

grown

fresh

mixed

mild

moderate

positive

strong

consistent

solid

cautious

modest

steady

overnight

quiet

calm

signal

alert

elaborate

detailed

extensive brief

lengthy

formal

informal

frequent

constant

plus

extra

additional

regular

special

accompanied

accompanying

placed

attached

summary

suprem

eunanimous

handled

addressed

resolved

fulfilled

relieved

assured

confident

pleased

satisfied

empowe

red

supervised

assisted

prom

ising

appealing

demanding

accepted

accepting

reverse

reversed

drawn

handed

taking

taken

given

giving

exclusive sole

granted

requested

authorized

permanent

partial

dated

listed

unknown

exact

actual

verified

confirm

ing

disclosed

topnotch

moderne

metal

iron

concrete

stone

votive

illuminated lit

lighted

carved

sculpted

painted

decorated

adorned

handmade

antique

stained

pastel

arch

arched

tiled

glazed

paneled

ornate

domed

immaculate

cathedral

outward

inland

farther

rounded

sheltered

twin tall

laid lay

nestled

tucked

shaped

etched

sacred

mythical

heavenly

divine

marian

elder

noble

hollywo

od star

swank

ruby

shining

stellar

glowing

faint

bygone

bust

gargantuan

gigantic

precious

priceless

rare

exotic

rustic

quaint

picturesque

secluded

idyllic

posh

upscale

upmarket

parisian

chic

fashionable

trendy

plush

luxe

palatial

luxurious

opulent

sumptuous

scenic

panoramic

sunset

sunrise

pleasant

breezy

sunny

gothic

baroque

architectural

decorative

interior

exterior

downhill

quadruple

double

triple

swimming

indoor

outdoor

world

olym

pic

golden

gold

silver

won

winning

round

final

finished

straight

open

grand

instrumental

solo

piano

bass

sound

familiar

sounding

singing

pop

master

magic

magical

twentieth

inspired

reminiscent

mirrored

reflected

evident

romantic

romance

adapted

novel

directed

directing

entitled

written

original

included

featured

straw

oval

identical

matching

matched

uniform fit

fitting

color

light

dark

bright

black

white

brown

gray

grey

yellow

pink

blue

green

curt

pitched

played

game

single

first

second

third

eleventh

ninth

fifth

sixth

seventh

capped

timed

opening

closing

stretch

midway

amicable

cordial

utmost

sincere

loving

righteous

virtuous

conscientious

compassionate

hearty

lukewarm

gracious

hospitable

respectful

courteous

polite

diligent

attentive

solicitous

considerate

sanitary

blanket

protective

botanical

landscaped

leafy

shaded

aerobic

drinkable

usable

plentiful

abundant

adequate

ample

satisfactory

workable

painless

tolerable

worthwhile

doable

inclusive

conducive

complem

entary

beneficial

flexible

effective

efficient cc

stereo

deluxe

compact cd

adjustable

trim

optional

groomed wed

educated

wealthy

healthy

healthier

vibrant

cosm

opolitan

tolerant

civilized

windward

virgin

coral

canary

tuscan

venetian

neapolitan

florentine

jamaican

hawaiian

creole

nigh

galore

presto

spellbound

pricy

homely

scrumptious

appetizing

alfresco

pampering

uncrowded

uncluttered

undisturbed

sparse

restful

sedate

unobtrusive

discreet

serene

tranquil

joyful

cheerful

cheery

encyclopedic

boundless

unparalleled

unsurpassed

unmatched

unbeatable

flawless

understated

effortless

impeccable

superlative

fluent

speaking

spoken

capable

powe

rful

aggressive

counter

smart

clever

sophisticated

innovative

slim

tame

marginal

meager

hefty

sizable

valuable

invaluable

handy

helpful

useful

spare

wasted

saved

saving

generous

paid

paying

brave

bold

fabulous

fantastic

wonderful

terrific

phenom

enal

astonishing

amazing

incredible

awesom

emarvelous

marvellous

glorious

majestic

magnificent

splendid

spectacular

stunning

breathtaking

dazzling

superb

brilliant

impressive

remarkable

deserved

decent

respectable

happy ok

comfortable

nice

perfect

seasoned

accomplished

talented

appreciative

enthusiastic

impressed

appreciated

fancied

entertained

adored

keen avid

curious

excited

amazed

delighted

thrilled

loved

liked

fortunate

lucky

grateful

glad

thankful hot

dry

salt

baking

cereal

crackers

peanut

fluffy

vanilla

caramel

blended

powdered

nuts

toasted

steamed

roast

chicken

fried

baked

grilled

cooked

roasted

olive mint

cherry

honey

ripe

juicy

tender

sweet

salty

savory

tasty

delicious

darling pat

ace

favourite

favorite

outback

super

mini

classic

homem

ade

takeout

complimentary

personalized

inflatable

plastic

rubber

disposable

cushioned

waterproof

removable

adagio

yummy

velvet

indigo

frosted

quilted

topless

nude

laughing

smiling

petite

polished

exquisite

elegant

graceful

crisp

sparkling

bubbly

lush

pristine

colorful

beautiful

gorgeous

lovely

tanned

refreshed

spotless

squeaky

room

ysnug

cozy

cosy

glossy

stylish

sleek

flashy

glam

orous

pricey

fancy

skinny

oversized

shiny

sparkly

finer

flowe

rylegible

airy

expansive

enclosed

furnished

spacious

interconnected

augm

ented

configured

elliptical

circular

rectangular

discrete

functional

corresponding

defined

preset

desired

optim

alapproximate

inferior

feminine

funny

humorous

ironic

amusing

hilarious

thoughtful

passionate

energetic

lively

spirited

mem

orable

unforgettable

exciting

fascinating

interesting

intriguing

informative

entertaining

enjoyable

authentic

timeless

fashioned

quintessential

funky

retro

eclectic

minimalist

pleasing

inspiring

alluring

enchanting

mesmerizing

charming

delightful

quirky

endearing

earthy

soothing

invigorating

refreshing

wholesome

homey

tasteful

classy

creepy

cute

adorable

chatty

personable

approachable

mannered

talkative

gritty

neat

tidy

humble

unassuming

unpretentious

Figure 9: Trees for 2, 397 adjectives on the leaves with branches colored based on β estimatedwith the lasso (Top), L1-ag-h (Middle) and our method (Bottom), respectively. Red branch,blue branch and gray branch correspond to negative, positive and zero βj, respectively.

Darker color indicates larger magnitude of βj and lighter color indicates smaller magnitude

of βj. The horizontal line shown in the middle plot corresponds to the height, chosen fromCV, at which L1-ag-h cuts the tree and merges the resulting branches.

0.05 0.10 0.20 0.50 1.00 2.00 5.00 10.00 20.00 50.00

0.0

0.2

0.4

0.6

0.8

% of Reviews Using Adjective

|bet

a|

Our MethodLassoL1−ag−h

Figure 10: Magnitudes of coefficient estimates |βj| versus feature density (on log scale) forour method (black circles), the lasso (red triangles) and L1-ag-h (green diamond).

19

Page 20: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

between our method and L1-ag-h. Inspection of the merged branches from our methodreveals words of similar meaning and sentiment being aggregated. In Figure 10, we plot|βj| against the percentage of reviews containing an adjective. We find that our methodselects rare words more than the other two methods. The rarest word selected by the lassois “filthy”, which appears in 0.47% of reviews. By contrast, our method selects many wordsthat are far more rare: at the extreme of rarest words, our method selects 797 words thatappear in only 0.059% of reviews; L1-ag-h selects 31 out of these 797 rarest words. Ourmethod is able to select more rare words through aggregation: it aggregates 2,244 words into224 clusters, leaving the remaining 153 words as singletons. Over 70% of these singletonsare dense words (where, for this discussion, we call a word “dense” if it appears in at least1% of reviews and “rare” otherwise). This is four times higher than the percentage of densewords in the original training data. Of the 224 aggregated clusters, 42% are made up entirelyof rare words. After aggregation, over half of these clusters become dense features. As acomparison, L1-ag-h aggregates 1,339 words into 506 clusters while keeping 1,058 wordsas singletons. Only 10% of these singletons from L1-ag-h are dense words. Of the 506aggregated clusters from L1-ag-h, 68% are made by rare words only, among which only 15%become dense clusters after aggregation.

Table 2 shows the density and estimated coefficient values for eight words falling in aparticular subtree of T . The words “heard” and “loud” occur far more commonly than theother six words. We see that the lasso only selects these two words whereas our methodselects all eight words (assigning them negative coefficient values). In contrast, L1-ag-h,while also aggregating the two densest words in this branch into a group, does not select thesix rare words. Examining the six rare words, it seems quite reasonable that they shouldreceive negative coefficient values. This suggests to us that the lasso’s exclusion of thesewords has to do with their rareness in the dataset rather than their irrelevance to predictingthe hotel rating.

adjectives heard loud yelled shouted screaming crying blaring bangingdensity 4 0.0300 0.0235 0.0006 0.0006 0.0029 0.0006 0.0006 0.0041

βlassoλ -0.057 -0.147 0 0 0 0 0 0

βL1−ag−hλ,height -0.174 -0.174 0 0 0 0 0 0

βoursλ -0.128 -0.128 -0.039 -0.039 -0.039 -0.039 -0.039 -0.039

Table 2: Term density and estimated coefficient for adjectives in the selected group

Existing work in sentiment analysis uses side information in other ways to improve pre-dictive performance (Thelwall et al., 2010). For example, Wang et al. (2016), working in anSVM framework, forms a directed graph over words that expresses their “sentiment strength”and then requires that the coefficients corresponding to these words honor the ordering thatis implied by the directed graph. For example if word A is stronger (in expressing a certainsentiment) than word B, which in turn is stronger than word C, then their method enforcesβA ≥ βB ≥ βC . Such constraints can in some situations have a regularizing effect thatmay be of use in rare feature settings: for example, if βA ≈ βC and B is a rare word, this

4The term density is computed over the training set.

20

Page 21: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

constraint would help pin down βB’s value. However, if βA and βC are very different fromeach other, the constraint may offer little help in reducing the variance of the estimate ofβB. Another difference is that the method uses hard constraints that are not controlled bya tuning parameter, so that even when there is strong evidence in the data that a constraintshould be violated, this will not be allowed. By contrast, our method shrinks toward theconstant-on-subtrees structure without forcing this to be the case.

Mean Squared Prediction Errortree setting our method L1 L1-dense L1-ag-h L1-ag-d

GloVe-50d 0.748 0.749 0.773 0.752 0.775GloVe-100d 0.742 0.749 0.773 0.747 1.173GloVe-200d 0.741 0.749 0.773 0.747 1.140ELMo 0.669 0.676 0.732 0.685 0.783

Table 3: Performance of five methods under various tree settings on the held-out test set.Among the tree settings, GloVe-50d, GloVe-100d and GloVe-200d correspond to hierarchicalclustering trees generated with GloVe embeddings of differing dimensions. We collapse overtwo million ELMo embedding vectors of adjectives into 6,001 clusters using mini-batch K-Means clustering (Sculley, 2010), then generate a hierarchical tree upon the 6,001 clustercentroids. The performance is based on 33,997 training reviews (i.e., corresponding to the20% row of Table 1).

Section 5 investigated the effect of using a distorted tree (see the bottom left panel ofFigure 5). In the context of words there are multiple choices of trees one could use. Table3 shows the effect of applying our method to different types of trees. Here we focus on onesituation with 33,997 training reviews. We first consider the effect of changing the dimensionof the GloVe embedding. One might suppose that as the embedding dimension increases thetree becomes more informative. Indeed, one finds that our method and L1-ag-h achieveimproved performance as this dimension increases. The last method, L1-ag-d, has morevariability in its performance as the GloVe tree varies, making it the poorest performingmethod. Both L1 and L1-dense do not make use of the tree and thus they are unaffected bychanges to the embedding dimension. But a limitation of the methods is that they all arebased on the bag-of-words model, and therefore do not take word context into consideration.For example, the models do not differentiate between the use of “bad” in “the hotel is bad”versus “the hotel is not that bad”. To bring context into consideration, we leverage deepcontextualized word representations from ELMo (Peters et al., 2018) to generate an auxiliarytree (See Appendix K for details). Unlike traditional word embeddings such as GloVe thatassociate one embedding per word, deep word embeddings can capture the meaning of eachword based on the surrounding context. From Table 3 we find test errors improve substan-tially with the ELMo tree for our method, L1 and L1-ag-h. Among all methods and all treesettings, our method with the ELMo tree performs the best. This suggests our method canbetter leverage the power of contextualized word embeddings than competing methods.

21

Page 22: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

7 Conclusion

In this paper, we focus on the challenge posed by highly sparse data matrices, which havebecome increasingly common in many areas, including biology and text mining. While muchwork has focused on addressing the challenges of high dimensional data, relatively little at-tention has been given to the challenges of sparsity in the data. We show, both theoreticallyand empirically, that not explicitly accounting for the sparsity in the data hurts one’s pre-diction errors and one’s ability to perform feature selection. Our proposed method is able tomake productive use of highly sparse features by creating new aggregated features based onside information about the original features. In contrast to simpler tree-based aggregationstrategies that are occasionally used as a pre-processing step in biological applications, ourmethod adaptively learns the feature aggregation in a supervised manner. In doing so, ourmethodology not only overcomes the challenges of data sparsity but also produces featuresthat may be of greater relevance to the particular prediction task of interest.

Acknowledgments

The authors thank Andy Clark for calling our attention to the challenge of rare features.This work was supported by NSF CAREER grant, DMS-1653017.

Appendices

A Failure of OLS in the Presence of A Rare Feature

Theorem 5. Consider the linear model (1) with X ∈ Rn×p having full column rank. Furthersuppose that Xj is a binary vector having k nonzeros. It follows that

P(∣∣∣βOLS

j (n)− β∗j∣∣∣ > η

)≥ 2Φ

(−ηk1/2/σ

)for any η > 0, (7)

where Φ(·) is the cumulative distribution function of a standard normal variable.

Proof. The distribution of the OLS estimator is βOLSj (n) ∼ N(β∗j , σ

2[(XTX)−1]jj). Byapplying blockwise inversion (see, e.g., Bernstein 2009), with the jth row/column of XTXin its own “block”, we get

[(XTX)−1]jj = [XTj Xj −XT

j X−j(XT−jX−j)

−1XT−jXj]

−1

= [‖Xj‖2 − ‖(XT−jX−j)

−1/2XT−jXj‖2]−1

≥ ‖Xj‖−2 = k−1.

Thus,

P(∣∣∣βOLS

j (n)− β∗j∣∣∣ > η

)= 2Φ

(− η

σ√

[(XTX)−1]jj

)≥ 2Φ

(−ηk1/2/σ

)where Φ(·) is the distribution function of a standard normal variable.

22

Page 23: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

B Proof of Theorem 2

In the setting of Theorem 2, we have Xβ∗ = Xβ∗, where X = X(Ik ⊗ 1p/k) ∈ Rn×k. The

two estimators, the oracle lasso on the aggregated data (X) and the lasso on the originaldata (X), are defined below.

• Oracle lasso estimator βoracleλ = βoracleλ ⊗ 1p/k where βoracleλ is the unique solution to

minβ∈Rk

1

2n‖y − Xβ‖2

2 + λ‖β‖1.

• Lasso estimator βlassoλ is defined in (2).

We begin by establishing that the interval I is nonempty, which is ensured by the con-straint k < p/(36 log n). In particular, the lower bound of the interval is below the upperbound if

12k log(k2n) < p log(2cp/k)

Now, log(k2n) ≤ 3 log n and log(2cp/k) ≥ 1 as long as 2cp/k > e. Now, 2c > 0.6, so ifp/k ≥ 5 then it would suffice to show that

36k log n < p.

And this constraint does imply that p/k ≥ 5. The two parts of the theorem follow from thefollowing two propositions.

Proposition 2 (Support recovery of oracle lasso). Suppose mini=1,...,k−1

∣∣∣β∗i ∣∣∣ > σ√

4kn

log(k2n).

With λ = σ√

log(k2n)kn

, the oracle lasso recovers the correct signed support successfully:

limn→∞

P(S±(βoracleλ ) = S±(β∗)

)= 1.

Proof. The scaled matrix√

knX is orthogonal since

XTX = (Ik ⊗ 1p/k)TXTX(Ik ⊗ 1p/k) =

n

p· pkIk =

n

kIk.

Orthogonality implies that

βoracleλ = S

((√k

nXT

)(√k

ny

), λk

)= S

(k

nXTy, λk

)(8)

where knXTy = k

nXTXβ∗ + k

nXTε = β∗ + k

nXTε ∼ Nk(β

∗, kσ2

nIk). By the Chernoff bound

for normal variables, for any t > 0,

P(∣∣∣∣kn(Xj)

Ty − β∗j∣∣∣∣ > t

)≤ 2 exp

(− t2

2kσ2/n

)for j = 1, . . . , k.

23

Page 24: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Choosing t = σ√

k log(k2n)n

and applying a union bound yields

P

(∥∥∥∥knXTy − β∗∥∥∥∥∞> σ

√k log(k2n)

n

)≤ 2k exp

(−σ

2k log(k2n)/n

2kσ2/n

)=

2√n.

Hence, with probability at least 1− 2√n, we have

∥∥∥ knXTy − β∗∥∥∥∞≤ σ

√k log(k2n)

n= λk, due

to our choice of λ = σ√

log(k2n)kn

. Under∥∥∥ knXTy − β∗

∥∥∥∞≤ λk, the following results hold.

• By β∗k = 0 and∣∣∣∣kn(Xk)Ty

∣∣∣∣ =

∣∣∣∣kn(Xk)Ty − β∗k

∣∣∣∣ ≤ ∥∥∥∥knXTy − β∗∥∥∥∥∞≤ λk,

we have βoracleλ,k = S(kn(Xk)

Ty, λk)

= 0 and βoracleλ,` = βoracleλ,k = β∗k = β∗` for all

` > k−1kn.

• For j = 1, . . . , k − 1, since∣∣∣ kn(Xj)

Ty − β∗j∣∣∣ ≤ λk and

∣∣∣β∗j ∣∣∣ ≥ mini=1,...,k−1

∣∣∣β∗i ∣∣∣ > 2λk, we

must have kn(Xj)

Ty and β∗j share the same sign. Moreover, we either have∣∣∣∣kn(Xj)Ty

∣∣∣∣ ≥ ∣∣∣β∗j ∣∣∣ > λk

or∣∣∣ kn(Xj)

Ty∣∣∣ < ∣∣∣β∗j ∣∣∣ in which case

∣∣∣ kn(Xj)Ty − β∗j

∣∣∣ =∣∣∣∣∣∣ kn(Xj)

Ty∣∣∣− ∣∣∣β∗j ∣∣∣∣∣∣ =

∣∣∣β∗j ∣∣∣ −∣∣∣ kn(Xj)Ty∣∣∣ ≤ λk and therefore∣∣∣∣kn(Xj)

Ty

∣∣∣∣ ≥ ∣∣∣β∗j ∣∣∣− λk ≥ 2λk − λk = λk.

Thus,∣∣∣ kn(Xj)

Ty∣∣∣ > λk for j = 1, . . . , k − 1. By definition of βoracleλ and (8), for

j−1kn < ` ≤ j

kn,

βoracleλ,` = βoracleλ,j = S

(k

n(Xj)

Ty, λk

)=k

n(Xj)

Ty

1− λk∣∣∣ kn(Xj)Ty∣∣∣

which is of the same sign as β∗j (and the same sign as β∗` ).

In the above two bullet points, we have shown S±(βoracleλ ) = S±(β∗) holds with probabilityat least 1− 2√

n. Hence,

lim infn→∞

P(S±(βoracleλ ) = S±(β∗)

)≥ lim

n→∞1− 2√

n= 1.

Since lim supn→∞ P(S±(βoracleλ ) = S±(β∗)

)= 1, the limit for P

(S±(βoracleλ ) = S±(β∗)

)is

1.

24

Page 25: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Lemma 2. Suppose ε ∼ Nn(0, σ2In) and c = 13e(π/2+2)−1

√14

+ 1π

. Then

P(

max`=1,...,p

|XT` ε| ≤ 2σ

√n

3plog(2cp)

)≤(

1− 1

p

)p.

Proof. Let Z be a standard Gaussian variable. Theorem 2.1 of Cote et al. (2012) provides alower bound for the Gaussian Q function (i.e., P(Z > z)). Choosing κ = 3

2in their Theorem

2.1 yields

P(Z > z) ≥

(1

3e(π/2+2)−1

√1

4+

1

π

)︸ ︷︷ ︸

c

e−3z2

4

where c = 13e(π/2+2)−1

√14

+ 1π

is independent of z. Given that X` has n/p one’s and X1p =

1n, we have XT` ε

iid∼ N(0, npσ2) for ` = 1, . . . , p. By expressing XT

1 ε =√

npσZ, we have for

any η > 0,

P(XT1 ε > η) ≥ ce−

3η2p

4σ2n ⇒ P(|XT1 ε| > η) ≥ 2ce−

3η2p

4σ2n .

Moreover,

P(

max`=1,...,p

|XT` ε| ≤ η

)=(P(|XT

1 ε| ≤ η))p

=(1− P

(|XT

1 ε| > η))p ≤ (1− 2ce−

3η2p

4σ2n

)p.

Plugging in η = 2σ√

n3p

log (2cp) in the above inequality yields

P(

max`=1,...,p

|XT` ε| ≤ 2σ

√n

3plog (2cp)

)≤(

1− 1

p

)p.

Proposition 3 (Failure of support recovery of lasso). Suppose mini=1,...,k−1

∣∣∣β∗i ∣∣∣ ≤ σ√

p3n

log (2cp/k)

where c = 13e(π/2+2)−1

√14

+ 1π

. The lasso fails to get high-probability signed support recovery:

lim supp→∞

supλ≥0

P(S±(βlassoλ ) = S±(β∗)

)≤ 1

e.

Proof. The lasso solution can be simplified to βlassoλ = S( pnXTy, λp). Since β∗` 6= 0 for

` ≤ k−1kp and β∗` = 0 for ` > k−1

kp, the following is a necessary condition for βlassoλ to recover

the correct signed support:

∃ λ s.t. |XT` y| > λp for ` ≤ k − 1

kp and |XT

` y| ≤ λp for ` >k − 1

kp

min`≤ k−1

kp|XT

` y| > max`> k−1

kp|XT

` y|.

25

Page 26: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Furthermore, we have

XT` y = XT

`

(p∑j=1

Xjβ∗j + ε

)=n

pβ∗` +XT

` ε.

Define i := arg mini=1,...,k−1

∣∣∣β∗i ∣∣∣ and A :=

max`> k−1

kp|XT

` ε| ≤ 2σ√

n3p

log(2cp/k)

. Then

P(S±(βlassoλ ) = S±(β∗)

)≤P

(min`≤ k−1

kp|XT

` y| > max`> k−1

kp|XT

` y|

)

≤P

(min

i−1kp<`≤ i

kp

|XT` y| > max

`> k−1kp|XT

` y|

)

≤P

(n

p

∣∣∣β∗i ∣∣∣+ mini−1kp<`≤ i

kp

|XT` ε| > max

`> k−1kp|XT

` ε|

)

=P

(n

p

∣∣∣β∗i ∣∣∣+ mini−1kp<`≤ i

kp

|XT` ε| > max

`> k−1kp|XT

` ε| and Ac)

+ P

(n

p

∣∣∣β∗i ∣∣∣+ mini−1kp<`≤ i

kp

|XT` ε| > max

`> k−1kp|XT

` ε|∣∣∣A) · P(A)

≤P

(n

p

∣∣∣β∗i ∣∣∣+ mini−1kp<`≤ i

kp

|XT` ε| > 2σ

√n

3plog(2cp/k)

)+ P (A)

=

[P(|XT

1 ε| > 2σ

√n

3plog(2cp/k)− n

p

∣∣∣β∗i ∣∣∣)] pk + P (A) (by XT` ε being i.i.d.)

≤[P(|XT

1 ε| > σ

√n

3plog(2cp/k)

)] pk

+ P (A)

(by∣∣∣β∗i ∣∣∣ ≤ σ

√p

3nlog(2cp/k)

)≤[2 exp

(− p

2σ2n· σ

2n

3plog (2cp/k)

)] pk

+

(1− k

p

) pk

(by Chernoff ineq and Lemma 2)

≤2pk exp

(− p

6klog (2cp/k)

)+

(1− k

p

) pk

=2pk

(2cp

k

)− p6k

+

(1− k

p

) pk

=

(cp

32k

)− p6k

+

(1− k

p

) pk

which holds for all λ ≥ 0. In particular,

supλ≥0

P(S±(βlassoλ ) = S±(β∗)

)≤(cp

32k

)− p6k

+

(1− k

p

) pk

.

26

Page 27: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Taking lim sup on both side yields

lim supp→∞

supλ≥0

P(S±(βlassoλ ) = S±(β∗)

)≤ lim

p→∞

(cp

32k

)− p6k

+ limp→∞

(1− k

p

) pk

= 0 +1

e=

1

e.

C Proof of Theorem 3

Let A ∈ 0, 1p×|C| denote the aggregation matrix corresponding to the partition C. Ag-

gregated least squares uses the design matrix X = XA, where each column is the sum offeatures in a group of C. Left-multiplication by A maps this back to p-dimensional space:

βC = AX+y.

Its fitted values are

XβC = XX+y = XX+(Xβ∗ + ε).

Now, X is full-rank and

XβC ∼ Np(XX+Xβ∗, σ2X[XTX]−1XT ).

Now,

E[XβC] = XX+Xβ∗

= XX+XAA+β∗ + XX+X(Ip −AA+)β∗

= XAA+β∗ + XX+X(Ip −AA+)β∗

since XAA+β∗ is in the column space of X, and

E[XβC]−Xβ∗ = X(AA+ − Ip)β∗ + XX+X(Ip −AA+)β∗

= (XX+ − In)X(Ip −AA+)β∗

E‖XβC −Xβ∗‖2 = ‖(XX+ − In)X(Ip −AA+)β∗‖2 + σ2tr(X[XTX]−1XT )

= ‖(XX+ − In)X(Ip −AA+)β∗‖2 + σ2|C|≤ ‖X‖2

op‖(Ip −AA+)β∗‖2 + σ2|C|

since ‖(XX+ − In)‖op = 1 and the trace of a projection matrix is the rank of the targetspace. Finally, observe that

‖(Ip −AA+)β∗‖2 =

|C|∑`=1

∑j∈C`

(β∗j − |C`|−1

∑j′∈C`

β∗j′

)2

.

27

Page 28: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

D Consensus ADMM for Solving Problem (5)

The ADMM algorithm is given in Algorithm 1. Let X = SVDcompact(U , D, V ) be the

compact singular value decomposition of X, where D ∈ Rmin(n,p)×min(n,p) is a diagonal matrixwith non-zero singular values on the diagonal, and U ∈ Rn×min(n,p) and V ∈ Rp×min(n,p)

contain the left and right singular vectors in columns corresponding to non-zero singularvalues, respectively. Similarly, we have (Ip : −A) = SVDcompact(·, ·, Q) where Q ∈ R(p+|T |):p

contains p right singular vectors correponding to non-zero singular values.

Algorithm 1 Consensus ADMM for Solving Problem (5)

Input: y,X,A, n, p, |T |, λ, α, ρ, εabs, εrel, maxite.

1: X = SVDcompact(·, D, V )

2: (Ip : −A) = SVDcompact(·, ·, Q)

3: β0 ← β(i)0 ← v(i)0 ← 0 ∈ Rp ∀i = 1, 2, 3

4: γ0 ← γ(j)0 ← u(j)0 ← 0 ∈ R|T | ∀j = 1, 2

5: continue← true

6: k ← 0

7: while k < maxite and continue do

8: k ← k + 1

9: β(1)k ←[V diag

(1

[DT D]ii+nρ

)V T + 1

nρ(Ip − V V T )

] (XTy + nρβk−1 − nv(1)k−1

)10: β

(2)kj ← S

(βk−1j − 1

ρv

(2)k−1j ,

λ(1−α)wjρ

)∀j = 1, . . . , p

11: γ(1)k` ←

S(γk−1` − 1

ρu

(1)k−1` , λαw`

ρ

)if w` > 0

γk−1` − 1

ρu

(1)k−1` if w` = 0

12:

(β(3)k

γ(2)k

)←(Ip+|T | − QQT

)[(βk−1

γk−1

)− 1

ρ

(v(3)k−1

u(2)k−1

)]13: βk ← (β(1)k + β(2)k + β(3)k)/3

14: γk ← (γ(1)k + γ(2)k)/2

15: v(i)k ← v(i)k−1 + ρ(β(i)k − βk) ∀i = 1, 2, 3

16: u(j)k ← u(j)k−1 + ρ(γ(j)k − γk) ∀j = 1, 2.

17: if√∑3

i=1 ‖β(i)k − βk‖22 +

∑2j=1

∥∥∥γ(j)k − γk∥∥∥22≤ εabs

√3p + 2|T |+εrel max

√∑3i=1 ‖β

(i)k‖22 +∑2

j=1

∥∥∥γ(j)k∥∥∥22,√

3∥∥βk

∥∥22

+ 2∥∥γk

∥∥22

and ρ

√3‖βk − βk−1‖22 + 2‖γk − γk−1‖22 ≤ ε

abs√

3p + 2|T | + εrel√∑3

i=1 ‖v(i)k‖22 +

∑2j=1 ‖u

(j)k‖22 then

18: continue← false

19: end if

20: end while

Output: βk,γk

28

Page 29: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

D.1 Derivation of Algorithm 1

The ADMM updates involve minimizing the augmented Lagrangian of the global consensusproblem (6),

Lρ(β(1),β(2),β(3),γ(1),γ(2),β,γ;v(1),v(2),v(3),u(1),u(2))

=1

2n

∥∥y −Xβ(1)∥∥2

2+ λ

α |T |∑`=1

w`

∣∣∣γ(1)`

∣∣∣+ (1− α)

p∑j=1

wj

∣∣∣β(2)j

∣∣∣+ 1∞β(3) = Aγ(2)

+3∑i=1

(v(i)T (β(i) − β) +

ρ

2‖β(i) − β‖2

2

)+

2∑j=1

(u(j)T (γ(j) − γ) +

ρ

2‖γ(j) − γ‖2

2

).

1. Update β(1).

β(1)k+1 := arg minβ(1)∈Rp

1

2n

∥∥y −Xβ(1)∥∥2

2+⟨v(1)k, (β(1) − βk)

⟩+ρ

2‖β(1) − βk‖2

2

.

Let X = SVD(U ,D,V ) be the singular value decomposition of X, where U ∈ Rn×n

contains left singular vectors in columns, V ∈ Rp×p contains right singular vectorsin columns, and D ∈ Rn×p is a rectangular diagonal matrix with decreasing singularvalues on the diagonal. First order condition to the above problem gives us:

(XTX + nρIp)β(1)k+1 = XTy + nρβk − nv(1)k

⇒ V (DTD + nρIp)VTβ(1)k+1 = XTy + nρβk − nv(1)k

⇒ β(1)k+1 = V diag(([DTD]ii + nρ)−1

)V T

(XTy + nρβk − nv(1)k

).

When n ≥ p, we have

β(1)k+1 = V diag(

([DTD]ii + nρ)−1)V T

(XTy + nρβk − nv(1)k

). (9)

When n < p, the SVD can be expressed in a compact form: D = (D : 0) and

V = (V : V⊥) where D ∈ Rn×n and V ∈ Rp×n are from the compact SVD of X, and

V⊥ ∈ Rp×(p−n). Thus,

V diag(([DTD]ii + nρ)−1

)V T =

(V : V⊥

)diag

(([DTD]ii + nρ)−1

)(V T

V T⊥

)= V diag

(([DTD]ii + nρ)−1

)V T + V⊥V

T⊥ /(nρ)

= V diag(

([DTD]ii + nρ)−1)V T + (Ip − V V T )/(nρ).

So when n < p,

β(1)k+1 =[V diag

(([DTD]ii + nρ)−1

)V T + (Ip − V V T )/(nρ)

] (XTy + nρβk − nv(1)k

).

(10)

Since V = V when n ≥ p and V V T = Ip, we have (10) boil to (9) in that case.

29

Page 30: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

2. Update β(2).

β(2)k+1 := arg minβ(2)∈Rp

ρ

2

∥∥∥∥β(2) −(βk − 1

ρv(2)k

)∥∥∥∥2

2

+ λ(1− α)

p∑j=1

wj

∣∣∣β(2)j

∣∣∣ .The solution is simply elementwise soft-thresholding:

β(2)k+1j = S

(βkj −

1

ρv

(2)j ,

λ(1− α)wjρ

)∀j = 1, . . . , p.

3. Update γ(1).

γ(1)k+1 := arg minγ(1)∈R|T |

ρ2∥∥∥∥γ(1) −

(γk − 1

ρu(1)k

)∥∥∥∥2

2

+ λα

|T |∑`=1

w`

∣∣∣γ(1)`

∣∣∣ .

Since sometimes we will choose to w` = 0 for the root, we break the solution into cases:

γ(1)k+1` =

S(γk` − 1

ρu

(1)k` , λαw`

ρ

)if w` > 0

γk` − 1ρu

(1)k` if w` = 0.

4. Joint update of β(3) and γ(2).(β(3)k+1

γ(2)k+1

):= arg min

β(3)∈Rp,γ(2)∈R|T |

∥∥∥∥β(3) −(βk − 1

ρv(3)k

)∥∥∥∥2

2

+

∥∥∥∥γ(2) −(γk − 1

ρu(2)k

)∥∥∥∥2

2

s.t. (Ip : −A)

(β(3)

γ(2)

)= 0.

The solution is the projection of

(βk

γk

)− 1

ρ

(v(3)k

u(2)k

)onto the null space of (Ip : −A).

Let (Ip : −A) = SVD(·, ,Q) where Q = (Q : Q⊥) ∈ R(p+|T |):(p+|T |) contains all the

right singular vectors in columns. So Ip+|T | = QQT = QQT + Q⊥QT⊥. Since Q

corresponds to non-zero singular values of (Ip : −A) by construction, we have Q⊥corresponds to the zero singular values, making itself an orthonormal basis for the nullspace of (Ip : −A). Thus,(

β(3)k+1

γ(2)k+1

)= Q⊥(QT

⊥Q⊥)−1QT⊥

[(βk

γk

)− 1

ρ

(v(3)k

u(2)k

)]= Q⊥Q

T⊥

[(βk

γk

)− 1

ρ

(v(3)k

u(2)k

)]=(Ip+|T | − QQT

)[(βkγk

)− 1

ρ

(v(3)k

u(2)k

)]

30

Page 31: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

5. Update global variables β and γ.

βk+1 := arg minβ∈Rp

3∑i=1

∥∥∥∥β − (β(i)k+1 +1

ρv(i)k

)∥∥∥∥2

2

= βk+1 +1

ρvk (11)

γk+1 := arg minγ∈R|T |

2∑j=1

∥∥∥∥γ − (γ(j)k+1 +1

ρu(j)k

)∥∥∥∥2

2

= γk+1 +1

ρuk (12)

where βk := β(1)k+β(2)k+β(3)k

3, vk := v(1)k+v(2)k+v(3)k

3, γk := γ(1)k+γ(2)k

2and uk := u(1)k+u(2)k

2.

6. Update dual variables.

v(1)k+1 := v(i)k + ρ(β(i)k+1 − βk+1) for i = 1, 2, 3,

u(1)k+1 := u(j)k + ρ(γ(j)k+1 − γk+1) for j = 1, 2.

Similarly, averaging the updates for u and the udpates for v gives

vk+1 = vk + ρ(βk+1 − βk+1) (13)

uk+1 = uk + ρ(γk+1 − γk+1) (14)

Substituting (11) and (12) into (13) and (14) yields that vk+1 = uk+1 = 0 after thefirst iteration.

Using βk = βk and γk = γk in the above updates, the updates become Lines 9-16 ofAlgorithm 1. Next, we follow Section 3.3.1 in Boyd et al. (2011) to determine the terminationcriteria. We first write Problem (6) in the same form as Problem (3.1) in Boyd et al. (2011)which is presented below in typewriter font:

minx,z

f(x) + g(z) s.t. Ax + Bz = c

where

A = I3p+2|T |, B = −

Ip 0Ip 0Ip 00 I|T |0 I|T |

, c = 0, x =

β(1)

β(2)

β(3)

γ(1)

γ(2)

and z =

(βγ

).

The primal and dual residuals are

rk+1 = Axk+1+Bzk+1−c =

β(1)k+1 − βk+1

β(2)k+1 − βk+1

β(3)k+1 − βk+1

γ(1)k+1 − γk+1

γ(2)k+1 − γk+1

and sk+1 = ρATB(zk+1−zk) = ρ

βk+1 − βkβk+1 − βkβk+1 − βkγk+1 − γkγk+1 − γk

.

By Condition (3.12) in Boyd et al. (2011), the ADMM algorithm stops when both resid-uals are small. In our case, the termination criteria are the following.

31

Page 32: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

1. The primal residual is small:√√√√ 3∑i=1

‖β(i)k − βk‖22 +

2∑j=1

‖γ(j)k − γk‖22

≤√

3p+ 2|T | · εabs + εrel ·max

√√√√ 3∑

i=1

‖β(i)k‖22 +

2∑j=1

‖γ(j)k‖22,

√3 ‖βk‖2

2 + 2 ‖γk‖22

.

2. The dual residual is small:

ρ·√

3‖βk − βk−1‖22 + 2‖γk − γk−1‖2

2 ≤√

3p+ 2|T |·εabs+εrel·

√√√√ 3∑i=1

‖v(i)k‖22 +

2∑j=1

‖u(j)k‖22.

D.2 Treatment of Intercept in Problem (5)

When an intercept β0 is included in the least squares, Problem (5) becomes:

minβ0∈R,β∈Rp,γ∈R|T |

s.t. β=Aγ

1

2n‖y −Xβ − β01n‖2

2 + λ

α |T |∑`=1

w` |γ`|+ (1− α)

p∑j=1

wj |βj|

. (15)

First-order coniditon of the solution (β0, β) yields that

∂ 12n‖y −Xβ − β01n‖2

2

∂β0

∣∣∣∣∣(β0,β)=(β0,β)

=1

n1Tn (1nβ0− (y−Xβ)) =

1

n(nβ0−1Tn (y−Xβ)) = 0.

So β0 = 1n1Tn (y −Xβ). Plugging β0 in Problem (15) and letting H = In − 1

n1n1

Tn yields

minβ∈Rp,γ∈R|T |

s.t. β=Aγ

1

2n‖Hy −HXβ‖2

2 + λ

α |T |∑`=1

w` |γ`|+ (1− α)

p∑j=1

wj |βj|

,

which can now be solved using our consensus ADMM algorithm.

E Proof of Lemma 1

We first show existence of such B∗ by providing a feasible procedure to find B∗. Supposeβ∗ has at least two distinct values (otherwise B∗ = r trivially). Start with B = L(T ) sothat the first constraint is satisfied. If for siblings u, v in B such that the second constraintis violated, by construction β∗j = β∗k for j ∈ L(Tu) and k ∈ L(Tv). So we replace u, v in Bwith their parent node. We repeat the above steps until the second constraint is satisfied,while holding the first constraint. Thus, B satisfies the two requirements for B∗.

Suppose B∗ and B∗ are different aggregating sets for β∗. Without loss of generality,suppose there exists u ∈ B∗ but u /∈ B∗. Then u is a descendant or an ancestor of somenodes in B∗; for either case the second constraint will be violated. Thus, such u does notexist and B∗ = B∗.

The existence and uniqueness of A∗ follow from the definition of support of β∗.

32

Page 33: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

F Proof of Theorem 4

Denote our penalty by Ω(β,γ) := α∑|T |

`=1 w` |γ`|+ (1−α)∑p

j=1 wj |βj|. We follow the proof

strategy used in Theorem 1 of Lou et al. (2016) to prove this theorem. If (β, γ) is a solutionto Problem (5), then we have

1

2n

∥∥∥y −Xβ∥∥∥2

2+ λΩ(β, γ) ≤ 1

2n‖y −Xβ‖2

2 + λΩ(β,γ)

for any (β,γ) such that β = Aγ. Let (β∗,γ∗) be such that

β∗ = AB∗β∗ and γ∗` =

β∗` if ` ∈ B∗

0 otherwise.

Plugging in y = Xβ∗ + ε and (β,γ) = (β∗,γ∗), with some algebra we have

1

2n

∥∥∥Xβ −Xβ∗∥∥∥2

2+ λΩ(β, γ) ≤ λΩ(β∗,γ∗) +

1

nεTX∆(β∗) (16)

where ∆(β∗) = β − β∗. By β = Aγ and β∗ = Aγ∗ (and writing ∆(γ∗) = γ − γ∗),

1

nεTX∆(β∗) =

1

nεTXA∆(γ∗).

We next bound n−1εTX∆(β∗) = (1−α)n−1εTX∆(β∗) +αn−1εTXA∆(γ∗) in absolute value.Define Vj := n−1/2XT

j ε for j = 1, . . . , p and U` := n−1/2AT`X

Tε for ` = 1, . . . , |T |. With thechoice of weights wj = ‖Xj‖2 /

√n for 1 ≤ j ≤ p and w` = ‖XA`‖2 /

√n for 1 ≤ ` ≤ |T |,∣∣∣∣ 1nεTX∆(β∗)

∣∣∣∣ =

∣∣∣∣(1− α)1

nεTX∆(β∗) + α

1

nεTXA∆(γ∗)

∣∣∣∣=

∣∣∣∣∣∣(1− α)

p∑j=1

XT

j ε√n‖Xj‖2

· ‖Xj‖2√n

∆(β∗)j

+ α

|T |∑`=1

AT`X

Tε√n‖XA`‖2

· ‖XA`‖2√n

∆(γ∗)`

∣∣∣∣∣∣=

∣∣∣∣∣∣(1− α)

p∑j=1

Vj‖Xj‖2

· wj∆(β∗)j

+ α

|T |∑`=1

U`

‖XA`‖2

· w`∆(γ∗)`

∣∣∣∣∣∣≤ (1− α)

p∑j=1

|Vj|‖Xj‖2

· wj∣∣∣∆(β∗)

j

∣∣∣+ α

|T |∑`=1

|U`|

‖XA`‖2

· w`∣∣∣∆(γ∗)

`

∣∣∣

≤ (1− α) max1≤j≤p

|Vj|‖Xj‖2

·p∑j=1

wj

∣∣∣∆(β∗)j

∣∣∣+ α max1≤`≤|T |

|U`|‖XA`‖2

·|T |∑`=1

w`

∣∣∣∆(γ∗)`

∣∣∣(17)

where the second to last inequality follows from the triangle inequality.

33

Page 34: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Since ε ∼ Nn(0, σ2In), Vj ∼ N (0, ‖Xj‖22σ

2/n) for j = 1, . . . , p and U` ∼ N (0, ‖XA`‖22σ

2/n)for ` = 1, . . . , |T |. By Lemma 6.2 of Buhlmann and van de Geer (2011), we have for x > 0

P

(max1≤j≤p

|Vj|‖Xj‖2

> σ

√2(x+ log p)

n

)≤ 2e−x and

P

(max

1≤`≤|T |

|U`|‖XA`‖2

> σ

√2(x+ log |T |)

n

)≤ 2e−x.

By the union bound,

P

(max1≤j≤p

|Vj|‖Xj‖2

> σ

√2(x+ log p)

nor max

1≤`≤|T |

|U`|‖XA`‖2

> σ

√2(x+ log |T |)

n

)≤ 4e−x.

By the construction of T , each internal node has at least 2 child nodes. To go up to the nextlevel from the leaf nodes, only one node “survives” among its siblings. For T with p leafnodes, there must be at most p− 1 internal nodes where the maximum number is achievedwhen T is a full binary tree. So |T | ≤ 2p. Choosing x = log(2p), we have that the followingresults hold with probability at least 1− 2/p:

max1≤j≤p

|Vj|‖Xj‖2

≤ 2σ

√log(2p)

nand max

1≤`≤|T |

|U`|‖XA`‖2

≤ 2σ

√log(2p)

n.

Plugging these upper bounds into (17), we have the following inequality holding with highprobability:∣∣∣∣ 1nεTX∆(β∗)

∣∣∣∣ ≤ 2σ

√log(2p)

n

(1− α)

p∑j=1

wj

∣∣∣∆(β∗)j

∣∣∣+ α

|T |∑`=1

w`

∣∣∣∆(γ∗)`

∣∣∣

= 2σ

√log(2p)

nΩ(∆(β∗), ∆(γ∗)) (18)

Let λ ≥ 4σ√

log(2p)/n and 0 ≤ α ≤ 1. By (16) and (18), the following holds withprobability at least 1− 2/p:

1

2n

∥∥∥Xβ −Xβ∗∥∥∥2

2≤ 1

2λΩ(∆(β∗), ∆(γ∗))− λΩ(β, γ) + λΩ(β∗,γ∗)

≤ 1

2

(λΩ(β, γ) + λΩ(β∗,γ∗)

)− λΩ(β, γ) + λΩ(β∗,γ∗) (by triangle inequality)

≤ 3

2λΩ(β∗,γ∗) =

3

(1− α)

p∑j=1

wj|β∗j |+ α

|T |∑`=1

w`|γ∗` |

=

3

((1− α)

∑j∈A∗

wj|β∗j |+ α∑`∈B∗

w`|β∗` |

)

where A∗ is the support of β∗.

34

Page 35: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

G Proof of Corollary 1

With ‖Xj‖2 ≤√n the weights in Theorem 4 become:

• For 1 ≤ j ≤ p, wj =‖Xj‖2√

n≤ 1.

• For 1 ≤ ` ≤ |T |, w` = ‖XA`‖2√n≤ 1√

n

∑j∈L(Tu` )

‖Xj‖2 ≤ |L(Tu`)|.

The inequality in w` is by the triangle inequality and the definition of A: the `th columnA` encodes the descendant leaves of the node u`, i.e., Aj` = 1j∈descendant(u`)∪u`. By

construction of β∗, all coefficients in the branch rooted by an aggregating node u` share thesame value, i.e., β∗j = β∗` for all j ∈ L(Tu`) where ` ∈ B∗. Thus, for ` ∈ B∗,

β∗` =

∥∥∥β∗L(Tu` )

∥∥∥1

|L(Tu`)|⇒

∑`∈B∗

w`

∣∣∣β∗` ∣∣∣ ≤∑`∈B∗|L(Tu`)|

∥∥∥β∗L(Tu` )

∥∥∥1

|L(Tu`)|

= ‖β∗‖1 .

Plugging the choice of λ and the above upper bounds into Theorem (4) yields

1

n

∥∥∥Xβ −Xβ∗∥∥∥2

2≤ 12σ

√log(2p)

n

((1− α)

∑j∈A∗

wj|β∗j |+ α∑`∈B∗

w`|β∗` |

)

≤ 12σ

√log(2p)

n((1− α) ‖β∗‖1 + α ‖β∗‖1)

. σ

√log p

n‖β∗‖1 (by log(2p) = log 2 + log p ≤ 2 log p)

which holds up to a multiplicative constant with probablity at least 1− 2/p.

H Proof of Corollary 2

With β∗1 = . . . = β∗p the aggregating set consists of the root, i.e., B∗ = r. Thus, β∗r =β∗1 = ‖β∗‖1/p and wr = ‖X1p‖2/

√n. With ‖Xj‖2 ≤

√n, we have wj = ‖Xj‖2/

√n ≤ 1 for

1 ≤ j ≤ p.Plugging λ = 4σ

√log(2p)/n into Theorem (4) yields

1

n

∥∥∥Xβ −Xβ∗∥∥∥2

2≤ 12σ

√log(2p)

n

((1− α)

∑j∈A∗

wj|β∗j |+ αwr|β∗r |

)

≤ 12σ

√log(2p)

n

((1− α) ‖β∗‖1 + α

‖X1p‖1

p√n‖β∗‖1

). σ

√log p

n‖β∗‖1

((1− α) + α

‖X1p‖2

p√n

)

35

Page 36: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

which holds up to a multiplicative constant with probablity at least 1 − 2/p. Now, if α ≥[1 + ‖X1p‖2/(p

√n)]−1

, then

1− α ≤ α‖X1p‖2

p√n

and so1

n

∥∥∥Xβ −Xβ∗∥∥∥2

2. σ

√log p

n‖β∗‖1 α

‖X1p‖2

p√n

It follows that if ‖X1p‖2 = o(p√n), then our method achieves a better rate than the lasso

rate of σ√

log p/n ‖β∗‖1.

I Proof of Proposition 1

Let X(1), . . . , X(p) ∈ 0, 1n be iid copies of the random vector X ∈ 0, 1n with 1TnX = rand P(Xi = 1) = r/n ∀i = 1, . . . , n. By construction X(1), . . . , X(p) each has r nonzeroentries. For the jth column of X, let Xj =

√n/rX(j) so that its nonzero entries equal√

n/r and ‖Xj‖2 =√n.

For the random vector X, its expectation and covariance (Σ) diagonal entries are

EX =r

n1n and Σii = V ar(Xi) =

r

n

(1− r

n

)=: τ 2.

Suppose ‖X − EX‖2 ≤ L for some L > 0. Let Z :=∑p

j=1 X(j) and v(Z) := max‖E[(Z −

EZ)(Z−EZ)T ]‖op,E[‖Z−EZ‖22]. Corollary 6.1.2 of Tropp (2015) gives the concentration

inequality for Z: ∀t ≥ 0,

P (‖Z − EZ‖2 ≥ t) ≤ (n+ 1) · exp

(−t2/2

v(Z) + Lt/3

). (19)

First, we derive a closed form for L. By construction of X (in particular 1TnX = r),∥∥∥X − EX∥∥∥2

2=∥∥∥X − r

n1n

∥∥∥2

2= (n− r)(0− r

n)2 + r(1− r

n)2 = r(1− r

n) = nτ 2.

So choose L = ‖X − EX‖2 =√nτ . Next, we simplify the two components of v(Z):

‖E[(Z − EZ)(Z − EZ)T ]‖op = ‖Cov(Z)‖op = p‖Σ‖op,

E[‖Z − EZ‖22] =

n∑i=1

V ar(Zi) = tr(Cov(Z)) = p · tr(Σ).

Furthermore, max‖Σ‖op, tr(Σ) = tr(Σ) since Σ is positive semidefinite. Combining theabove yields

v(Z) = p · tr(Σ) = p · n · τ 2 = pL2.

Substituting v(Z) and EZ = pr/n · 1n into (19) yields

P(∥∥∥Z − pr

n1n

∥∥∥2≥ t)≤ (n+ 1) · exp

(−t2/2

pL2 + Lt/3

)∀t ≥ 0.

36

Page 37: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Since Xj =√n/rX(j), we have X1p =

√n/rZ and ∀t ≥ 0,

P(∥∥∥∥X1p − p

√r

n1n

∥∥∥∥2

≥√n

rt

)= P

(∥∥∥Z − pr

n1n

∥∥∥2≥ t)≤ (n+1)·exp

(−(t/L)2/2

p+ (t/L)/3

)=: F (t).

Choosing t = pr/√n and by the triangle inequality,

‖X1p‖2 ≤ ‖X1p − p√r/n · 1n‖2 + ‖p

√r/n · 1n‖2 ≤

√n/r · t+ p

√r = 2p

√r

holds with probability at least 1 − F (pr/√n). Also t/L = pr/(nτ). We next simplify

F (pr/√n).

F

(pr√n

)= (n+ 1) · exp

(−1

2

(prnτ

)2

p+ 13

(prnτ

)) = (n+ 1) · exp

(− pr2

2n2τ2

1 + r3nτ

)

⇒ logF

(pr√n

)= log(n+ 1)− 1

2· p(r/n)2

τ 2 + 13· τ · r

n

≤ log(n+ 1)− 1

2· p(r/n)2

r/n+ 13(r/n)3/2

(by τ =

√r

n

(1− r

n

)≤√r

n

)≤ log(n+ 1)− 1

2· p(r/n)

1 + 13

(by

r

n≤ 1)

= log(n+ 1)− 3

8· prn.

So, P(‖X1p‖2 ≤ 2p√r) ≥ 1 − exp

(log(n+ 1)− 3pr

8n

). Note that r

n> 8 log(n+1)

3pensures the

exponent converges to negative infinity as n and p increase. Thus, if r > 8n log(n+1)3p

, then

‖X1p‖2 = Op(p√r) ⇔ ‖X1p‖2

p√n

= Op

(√r

n

).

So, we have ‖X1p‖2 = op(p√n) if r = o(n) and r > 8n log(n+1)

3p.

J Paired t-tests between our method and L1-ag-h

To compare the test errors in the first two columns of Table 1 in the main paper, we performa two-sample t-test for each row. For each of the n = 40, 000 reviews, a pair of squaredprediction errors is calculated, one from each method. Conditional on the training set, thesen pairs can be thought of as independent. Table 4 shows the resulting p-values.

prop. 1% 5% 10% 20% 40% 60% 80% 100%p-value 3.95× 10−9 0.086 7.21× 10−7 7.56× 10−8 2.2× 10−16 0.056 0.105 0.001

Table 4: Paired t-tests between squared prediction errors from our method and L1-ag-h onthe held-out test set. The prop. values correspond to different training sizes in Table 1.

37

Page 38: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

K Hierarchical Clustering Tree with ELMo

Embeddings from Language Models (ELMo) is a state-of-the-art natural language processingframework for representing words with deep contextualized embeddings (Peters et al., 2018).ELMo leverages a deep neural network that is pre-trained on a large text corpus and hasmultiple attractive properties. For our purpose, the most relevant property of ELMo is thatits word embeddings account for the context in which the words appear. In particular, theELMo embedding assigned to a word is a function of the entire sentence containing thatword. This means that the same word in different contexts will have different embeddings.This is a great advantage over traditional word embeddings such as GloVe (Pennington et al.,2014).

Among the training sets listed in Table 1, we focus on one scenario when n = 33, 997 andp = 5, 621 (which corresponds to 20% of the entire training reviews). We use TensorFlow(Abadi et al., 2015) and an ELMo model pre-trained on the One Billion World LanguageModel Benchmark5 to get 1,024-dimensional embeddings for all words in the 33,997 trainingreviews and the 40,000 held-out testing reviews. To establish a fair comparison with existingexperimentation, we keep only adjective embeddings in the reviews, which leaves us 952,974adjective embeddings for training and 1,122,030 adjective embeddings for testing. Again,the same adjective in various sentences has different ELMo embeddings whose proximitiesencode how the word is used in each context.

We use mini-batch K-Means clustering (Sculley, 2010) to cluster the 952,974 adjectivesfrom training into 6,001 clusters. The choice of the number of K-Means clusters ensuresthat the comparison between ELMo and GloVe (which is based on p=5,621) is roughly onthe same scale of features. After mini-batch K-Means clustering, we update each clustercentroid to the averaged embeddings for training adjectives assigned to the cluster. We alsoassociate testing adjectives to the clusters by Euclidean distance. After associating adjectivesto clusters, we construct a document-cluster matrix for training and testing, respectively.The ij-entry of the document-cluster matrix has the number occurrences of the jth clusterin the ith review. Each cluster represents a collection of occurrences of adjectives usedin a similar context within a review. We next generate a tree by performing hierarchicalclustering on the cluster centroids. The tree and document-cluster matrix is then used infitting our model along with others such as L1-ag-h and L1-ag-d.

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S.,Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard,M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga,R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I.,Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden,P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. (2015). TensorFlow: Large-scalemachine learning on heterogeneous systems. Software available from tensorflow.org.

5http://www.statmt.org/lm-benchmark/

38

Page 39: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Arnold, T. B. and Tibshirani, R. J. (2014). genlasso: Path algorithm for generalized lassoproblems. R package version 1.3.

Bernstein, D. S. (2009). Matrix Mathematics: Theory, Facts, and Formulas (Second Edition).Princeton University Press.

Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011). Distributed optimizationand statistical learning via the alternating direction method of multipliers. Found. TrendsMach. Learn., 3(1):1–122.

Buhlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods,Theory and Applications. Springer Publishing Company, Incorporated, 1st edition.

Cao, Y., Zhang, A., and Li, H. (2017). Microbial Composition Estimation from Sparse CountData. ArXiv e-prints.

Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello,E. K., Fierer, N., Pena, A. G., Goodrich, J. K., Gordon, J. I., Huttley, G. A., Kelley, S. T.,Knights, D., Koenig, J. E., Ley, R. E., Lozupone, C. A., McDonald, D., Muegge, B. D.,Pirrung, M., Reeder, J., Sevinsky, J. R., Turnbaugh, P. J., Walters, W. A., Widmann,J., Yatsunenko, T., Zaneveld, J., and Knight, R. (2010). QIIME allows analysis of high-throughput community sequencing data. Nat. Methods, 7(5):335–336.

Chen, J., Bushman, F. D., Lewis, J. D., Wu, G. D., and Li, H. (2013). Structure-constrainedsparse canonical correlation analysis with an application to microbiome data analysis.Biostatistics, 14(2):244–258.

Cote, F. D., Psaromiligkos, I. N., and Gross, W. J. (2012). A Chernoff-type Lower Boundfor the Gaussian Q-function. ArXiv e-prints.

Feinerer, I. and Hornik, K. (2016). wordnet: WordNet Interface. R package version 0.1-11.

Feinerer, I. and Hornik, K. (2017). tm: Text Mining Package. R package version 0.7-1.

Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Bradford Books.

Forman, G. (2003). An extensive empirical study of feature selection metrics for text classi-fication. J. Mach. Learn. Res., 3:1289–1305.

Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalizedlinear models via coordinate descent. Journal of Statistical Software, 33(1):1–22.

Guinot, F., Szafranski, M., Ambroise, C., and Samson, F. (2017). Learning the optimal scalefor GWAS through hierarchical SNP aggregation. ArXiv e-prints.

Huang, A. (2008). Similarity measures for text document clustering. In Proceedings ofthe sixth new zealand computer science research student conference (NZCSRSC2008),Christchurch, New Zealand, pages 49–56.

39

Page 40: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Ke, T., Fan, J., and Wu, Y. (2015). Homogeneity Pursuit. J Am Stat Assoc, 110(509):175–194.

Khabbazian, M., Kriebel, R., Rohe, K., and An, C. (2016). Fast and accurate detectionof evolutionary shifts in ornsteinuhlenbeck models. Methods in Ecology and Evolution,7(7):811–824.

Kim, S., Xing, E. P., et al. (2012). Tree-guided group lasso for multi-response regression withstructured sparsity, with an application to eqtl mapping. The Annals of Applied Statistics,6(3):1095–1117.

Li, C. and Li, H. (2010). Variable selection and regression analysis for graph-structuredcovariates with an application to genomics. Ann. Appl. Stat., 4(3):1498–1516.

Li, Y., Raskutti, G., and Willett, R. (2018). Graph-based regularization for regressionproblems with highly-correlated designs. ArXiv e-prints.

Lin, W., Shi, P., Feng, R., and Li, H. (2014). Variable selection in regression with composi-tional covariates. Biometrika, 101:785–797.

Liu, X., Yu, S., Janssens, F., Glanzel, W., Moreau, Y., and De Moor, B. (2010). Weightedhybrid clustering by combining text mining and bibliometrics on a large-scale journaldatabase. J. Am. Soc. Inf. Sci. Technol., 61(6):1105–1119.

Lou, Y., Bien, J., Caruana, R., and Gehrke, J. (2016). Sparse partially linear additivemodels. Journal of Computational and Graphical Statistics, 25(4):1126–1140.

Matsen, F. A., Kodner, R. B., and Armbrust, E. V. (2010). pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.BMC Bioinformatics, 11:538.

McMurdie, P. J. and Holmes, S. (2013). phyloseq: An r package for reproducible interactiveanalysis and graphics of microbiome census data. PLOS ONE, 8(4):1–11.

Mohammad, S. M. and Turney, P. D. (2013). Crowdsourcing a word–emotion associationlexicon. Computational Intelligence, 29(3):436–465.

Mukherjee, R., Pillai, N. S., and Lin, X. (2015). Hypothesis testing for high-dimensionalsparse binary regression. Ann. Statist., 43(1):352–381.

Pennington, J., Socher, R., and Manning, C. D. (2014). Glove: Global vectors for wordrepresentation. In Empirical Methods in Natural Language Processing (EMNLP), pages1532–1543.

Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer,L. (2018). Deep contextualized word representations. In Proc. of NAACL.

R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foun-dation for Statistical Computing, Vienna, Austria.

40

Page 41: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Randolph, T. W., Zhao, S., Copeland, W., Hullar, M., and Shojaie, A. (2015). Kernel-Penalized Regression for Analysis of Microbiome Data. ArXiv e-prints.

Ridenhour, B. J., Brooker, S. L., Williams, J. E., Van Leuven, J. T., Miller, A. W., Dearing,M. D., and Remien, C. H. (2017). Modeling time-series data from microbial communities.ISME J, 11(11):2526–2537.

Schloss, P., L Westcott, S., Ryabin, T., R Hall, J., Hartmann, M., Hollister, E., Lesniewski,R., Oakley, B., Parks, D., Robinson, C., W Sahl, J., Stres, B., G Thallinger, G., Van Horn,D., and Weber, C. (2009). Introducing mothur: Open-source, platform-independent,community-supported software for describing and comparing microbial communities. Ap-plied and environmental microbiology, 75(23):7537–41.

Sculley, D. (2010). Web-scale k-means clustering. In Proceedings of the 19th InternationalConference on World Wide Web, WWW 10, page 11771178, New York, NY, USA. Asso-ciation for Computing Machinery.

She, Y. (2010). Sparse regression with exact clustering. Electron. J. Statist., 4:1055–1096.

Shi, P., Zhang, A., and Li, H. (2016). Regression analysis for microbiome compositionaldata. Ann. Appl. Stat., 10(2):1019–1040.

Tang, Y., Li, M., and Niclolae, D. L. (2016). Phylogenetic Dirichlet-multinomial model formicrobiome data. ArXiv e-prints.

Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., and Kappas, A. (2010). Sentiment inshort strength detection informal text. J. Am. Soc. Inf. Sci. Technol., 61(12):2544–2558.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the RoyalStatistical Society, Series B, 58:267–288.

Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso. Ann.Statist., 39(3):1335–1371.

Tropp, J. A. (2015). An introduction to matrix concentration inequalities. Foundations andTrends in Machine Learning, 8(1-2):1–230.

Wallace, M. (2007). Jawbone Java WordNet API.

Wang, H., Lu, Y., and Zhai, C. (2010). Latent aspect rating analysis on review text data:A rating regression approach. In Proceedings of the 16th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, KDD ’10, pages 783–792, NewYork, NY, USA. ACM.

Wang, J., Shen, X., Sun, Y., and Qu, A. (2016). Classification with unstructured predictorsand an application to sentiment analysis. Journal of the American Statistical Association,111(515):1242–1253.

Wang, T. and Zhao, H. (2017a). A Dirichlet-tree multinomial regression model for associatingdietary nutrients with gut microorganisms. Biometrics, 73(3):792–801.

41

Page 42: Rare Feature Selection in High Dimensions › pdf › 1803.06675.pdf · ture selection when features are rare. This motivates the remainder of the paper, in which we devise a strategy

Wang, T. and Zhao, H. (2017b). Structured subcomposition selection in regression and itsapplication to microbiome data analysis. Ann. Appl. Stat., 11(2):771–791.

Xia, F., Chen, J., Fung, W. K., and Li, H. (2013). A logistic normal multinomial regressionmodel for microbiome compositional data analysis. Biometrics, 69(4):1053–1063.

Yu, G. and Liu, Y. (2016). Sparse regression incorporating graphical structure among pre-dictors. Journal of the American Statistical Association, 111(514):707–720.

Zhai, J., Kim, J., Knox, K. S., Twigg, H. L., Zhou, H., and Zhou, J. J. (2018). Vari-ance component selection with applications to microbiome taxonomic data. Frontiers inMicrobiology, 9:509.

Zhang, T., Shao, M.-F., and Ye, L. (2012). 454 pyrosequencing reveals bacterial diversity ofactivated sludge from 14 sewage treatment plants. The ISME Journal, 6(6):1137–1147.

42