"crosswords by computer—or 1,000 nine-letter words a day for fun and profit," by eric...

4
fr#f#f decided to have a child, andthat I would stay home after the birth. I therefore needed a new job, one I couldpractrce from our house while simultaneously feeding andentertaining a baby. 'l O resnuanv r ggz GAMEs his story begins in the summer of 1989 I was in a quandary. My wife, Peg, and lhad recently As luck would have it, summer isthe time for the annual convention of the National Puzzlers' League. As I sat in the lobby of a Cleveland hotelchattinq with noted puzzle constructors like -Henry Hook and ScottMarley, I began to con- sider puzzling as a profession. I couldsit in my den,I thought, and dash off cross- words tor magazine editors lalready knew through the League. Even better, I would finally have a way to rationalize purchasing every reference book in the world, tax-deductibly, no less. I told my puzzle colleagues the plan. Their response was immediate: "Give it up now. " Apparently, these same

Upload: pspuzzles

Post on 25-Oct-2015

368 views

Category:

Documents


2 download

DESCRIPTION

A classic article on one constructor's experiences in the early years of crossword construction software.

TRANSCRIPT

Page 1: "Crosswords by Computer—or 1,000 Nine-Letter Words a Day for Fun and Profit," by Eric Albert

fr#f#f

decided to have a chi ld , and that I woulds tay home a f te r t he b i r t h . I t he re fo reneeded a new job, one I could practrcef r o m o u r h o u s e w h i l e s i m u l t a n e o u s l yfeeding and enter ta in ing a baby.

' l O resnuanv r ggz GAMEs

his s tory begins in the summer of1989 I was i n a quanda ry . Myw i f e , P e g , a n d l h a d r e c e n t l y

As luck would have i t , summer is thet ime fo r t he annua l conven t i on o f t heNational Puzzlers' League. As I sat in thelobby of a Cleveland hotel chattinq withno ted puzz le cons t ruc to rs l i ke

-Hen ry

Hook and Scott Marley, I began to con-sider puzzling as a profession. I could sitin my den, I thought, and dash off cross-

w o r d s t o r m a g a z i n e e d i t o r s l a l r e a d yknew through the League. Even better, Iwould f ina l ly have a way to rat ional izepurchasing every reference book in theworld, tax-deductibly, no less.

I to ld my puzzle col leagues the p lan.Their response was immediate: "Give i tu p n o w . " A p p a r e n t l y , t h e s e s a m e

Page 2: "Crosswords by Computer—or 1,000 Nine-Letter Words a Day for Fun and Profit," by Eric Albert

thoughts had come to mi l l ions of otherpeople, resul t ing in the dreaded , ,buyersmarket" and its inevitable corollary-lowfees As one person so aptly put ri, ,, l t rsa mrstake to compete in a market wi thpeople who are wi l l ing to work for f ree. , ,

After f inding out the pittance paid formost puzzles ($15 to $75 for a 15 x 15_

I L L U S T R A T I O N B Y J O T T O S E I B O T D

squa re c rossword , $50 to $250 fo r aSunday-s ize 21 x21) , I had to aqree. Fewp e o p l e c o u l d g r i n d t h e m o u i o u r c k l venough to be able to af ford both foodand she l t e r . Bu t cou ld a compu te r f i l lcrosswords grids, I wondered, and do soat a professional qualitv level?

I l ooked a round to see wha t was

c o m m e r c i a l l y a v a i l a b l e . N o t m u c h . i tt u r n e d o u t . T h e f e w p r o g r a m s t h a tclarmed to construct crosswords actuallvjust bui l t a loosely- in ter locked cr isscrosspuzzre trom a user-supplied l ist of themewords. No symmetry, no complete cross_cnecktng ot le t ters, no prospect of se l l ingtne resul ts .

G A M E s F E B R U A R y 1 9 9 2 l l

Page 3: "Crosswords by Computer—or 1,000 Nine-Letter Words a Day for Fun and Profit," by Eric Albert

I a l so pu rchased a copy o f , ,TheCrossword Puzzler," a proqrjm for theI B M P C w r i t t e n b y M e l { o s e n , w e l l -known constructor and editor. This pro-g r a m w a s d e s i g n e d a s a n a i d f o rprofessional crossworders and helps withmaking a grid; f inding.words that matcha specttted pattern; entering. storing, andretr iev ing c lues; and output t ing a gr idand clues to a printer. The program per-formed wel l , and i t cer ta in ly made theconstructors 1ob easier. But I didn't wantto make the 1ob easier; I wanted to makethe job go away.

Since there was nothing I could buy,maybe there was something I could build.A stack of quarters and a few eveninosspent in MlTs engineering l ibrary resultddin a pile of photocopies of academic arti-c les about crosswords and computers.Several days and much tedious readinolater, I concluded that all research so faihad focused solely on finding the quickestway to fi l l a grid, with zero attention oaidto the quality of the result.

This approach can best be summa-r i zed w i th a s i ng le sen tence f rom thepape r "Sea rch Lessons Lea rned f romCrossword Puzzles" by Ginsberg, Frank,Halpin, and Torrance: "Whv use a wordwi th a e when one wrth jn S could beused instead?" The authors meant th isquestron rhetor ica l ly , but i t has a real -world answer: "Because a puzzle with aQ is more interesting and more l ikely tosel l than one wi th an s." Noted ouiz leconstructors l ike Mer l Reagle and Tr ipPayne pride themselves on their abil itv towork co lo r f u l wo rds and ph rases i n totheir grids, and editors use such entriesto d i s t i ngu i sh be tween the obv ious l vexcellent and the merely mediocre. I hadknown from the start that my programwou ld have to show a s im i l a r sk l l l l nword choice; now I real ized that I wasgorng to get no help from my academiccolleagues. I would have to qo it alone.

ne Sa tu rday i n m id -Augus t o f1989, I locked mysel f rn my a i r -condi t ioned of f ice at work and

spent 13 st ra ight hours cranking out acrude crossword program on my comput-er .For i ts f i rs t test , I gave i t a t iny gr idwith 12 empty squares. R few tense ri in-utes later, the program dumped its f irsttrl l onto the screen. I let out a whooo ofexhilaration. Leaving the program to con-t i nue f i nd ing o the r poss ib le comb ina -tions, I headed for home.

.. Monday morning, I bustled into myoffice and found my program sti l l chug-ging away. My spirits sank to my sneak-ers. Crossword constructors had nothinoto fear from a competitor that failed t6exhaust the possibil i t ies for 12 letters in36 hours. I put the program out of i tsmisery and decided to call in the cavalrv.

t 2 FEBRUAnv 1992 GAMEs

Over the next few days I held longtalks with my boss Mike Albert (no rela-tion) and my friend Alan Frank. Mike hasdecades of software engineering experi-ence and Alan is a bri l l iant develooer ofcompu te r a lgo r i t hms . Comb in ing the i rsuggest ions wi th more weekend workgave dramatic results, and a few weekslater my program could f in ish the testgrrd in 19 minutes. Sti l l nothing to writehome about, but good enough to makeme turn my at tent ion to the databaseproblem.

So far, the program had been find-ing all possible fi l ls for a qiven qrid frao-ment. But there were often thoJsands 5fways to f i l l even a smal l sect ion. andlooking through a l l o f them fe l t suspi -ciously l ike work. In addition, most of thefi l ls contained junky words that severelyl imi ted any commercia l potent ia l . Theprogram had to acquire some taste.

Peg and I ranked all the three- andfour-letter database entries usinq the fol-lowing sophisticated scheme: 0-= great,1 = average, 2 = bleah. t modified theprogram accordingly and the results weremarkedly better. This was excitinq stuff.No one I knew of had previously demon-strated that a program could tell the dif-terence between a good and a bad wayto lt l l a crossword orid.

l l , , r r r tJ t ime srnce then hasn been spent in three ways: 1)? t l l i m p r o v i n g t h e p r o g r a m , Z )improv ing the da tabase , and 3 ) con -structing and marketing crosswords. UntilMarch 1991, when baby Gus entered thewodd and I quit my computer job, puzzletime was very hard to come by. Neverthe-less, th is month 's "Crazy Eights, , gr idshows that l 've made some progress.

Most of my program modificationshave been speed improvements. The bestfi l l for my original test grid can now beIound rn tour seconds. Speed-up on big_ger grid fragments has been at least asdramat ic , wi th some t r ickv cases nowfil led 100,000 times faster.

The database work has been themost ttme-consuming. The challenge wasto emulate an expert's abil ity to choosewords. My or ig inal so lut ion was to gothrough the database, one word af atime, and type in an appropriate ,,good-ness" rating for each entry. Two vearslater, thats sti l l my solution-and l,rir sti l lIyprng.

I rank 1,000 words a day, come raino1 9hi1e In two weeks (from'this writ ing)l ' l l f inish off the nine-letter entries, takelbrief sabbatical, and then start in on thetens. Word-ranking is a hobby I can rec-ommend only to those who get overstim-ulated watching paint dry.

_The rating q6tem now has 13 categor-ies, from 0 to i2 (see sidebar, next pa-ge).

The rankings are designed to capture thedistinctions required to fi l l grids at a qual_ity level indistinguishable from that of thebest human constructors. lt had better;any rat tng system change now wouldtorce me to rerank the 257,837 databaseentries l 've already done.

Rank ing requ i res t h i ngs to rank .Originally, I collected the unabridged andcollegiate dictionaries that were alailableIn computer-readable form and mushedthem al l together . This gave me goodbasic coverage of English words; inlact,it was overkil l . More than half of theseentries had to be given the lowest rank-ing because of thei r obscur i ty . Amongthese words to avoid were i l l ,,cross--wordese," those ancient weights, herald-ic terms, and var iant spel l ings of minorHindu deities that longtime solvers havegrown all too familiar with.

I added computer copies of severalphrase dictionaries, a thesaurus or two.and some long l is ts of proper names.Now I lusted af ter a source for thosepop-cul ture references that toonotchco-nstructors use to spice up their grids.Mike Alber t came through once again,collecting for me a passel of enteriain-ment programs f rom var ious comouternetworks. These products were intendedto test the user5 knowledge of movies orsports or rock 'n' roll. But they also pro_v ided da tabases tha t , once 'decoded .could be raided for my own purposes.

Alas, some crossword puzzle entriescannot be found in any avai lable com-pu te r f i l e , en t r i es l i ke RLs ( "T reasu re/s/and author 's monogram") and nsnru("Strong _ ox"). The thousands of suchitems in my database have been labori-ously collected from actual puzzles andentered by hand. My hand. Wheneversome wtse guy says doing crosswords bycomputer isn ' t " real ly" work I th ink tomyself, "You've obviously never spent aday typing in 4,000 Roman numerals.,,

f I ow I use the computer to con-II struct a crossword changes fromt I week to week as I ar i tomatemore tasks and take on new challenoes.Here's a snapshot for Auqust 'l 991 .

First, I come up wit6 a set of themeentr ies. This is a creat lve endeavor. sothereb no single correct approach. I oftenask a friend for an idea, or think of onemyself, but the computer can also be sur-prisingly helpful. Say l 've decided to com-memorate Vincent van Gogh by creatinga p u z z l e i n w h i c h e a c h o f t h e l o n oentries has had the word EAR removed.lInstruct a search program to show methe appropr iate database entr ies, and,five seconds later, I can paw through al i s t c o n t a i n i n g i t e m s l i k e " A m l l i aEarhart," "b6arnaise sauce," and ,,rear-vrew mirror."

Page 4: "Crosswords by Computer—or 1,000 Nine-Letter Words a Day for Fun and Profit," by Eric Albert

UlfHan aWonp's WoRruCategorizing entries is subjectiveand somewhat arbitrarv. Since Iconstruct puzzles for mainstreampublications, my rating systemattempts to reflect the averagesolver's tastes and still retain

-

enough of my personal preludicesto give the puzzles a unique flavor.

In my scheme, multiple-wordphrases and full names rank veryhigh. Hipness, vividness, andinteresting quirkiness raise anentry3 value. So do rare letters likeJ, K, Q, X, and Z. Prefixes andsuffixes (PRE-, -ED. -ATION) lower arating. Crossword clich6s (ouo;,trade names (N|KON), abbreviations(STD.). and words that are foreiqn(orur) or difficult to clue in fres-hways (HER) get marked down. Fill-in-the-blank entries (like oFLA forMan _ Mancha) receive poorratings. 5o do words that seem torequire more than a high-schooleducation (HAUBERK).

Descriptions (with examples) of mycategories:

O - FABULOUS KuuQunTQUTCK FtX)

7 - GREAT (NEWY)RK,ALHIRI)2 . VERY GOOD AUUON,

tAwE0NE)3 - COLORFUL (TUue BABooN)4 . ABOVE AVERAGE

(ASPARAGUS, MAC,AO)5 - AVERAGE (tNN, EcoNoMy)6 - BELOW AVERAGE fiPS.

KNOCKED)7 - BOR,WG (LATEMLLV ELLS)8 - FIAWED (You'LL ocr.)9 - STRETCHING (cowy, affEns)

10 . YUCKY (coWER, AN2A)1I - SPECIALIZED (ucALEGoN,

<obscene>)12 - VERY YUCKY (BERTL., sHILfA)

I usually f i l l grids using only entrieswith a rating of 9 or better. For anextremely tough corner, I haveoccasionally turned on the 10s. Ihope never to be so desperate that Iconsider using the 12s.

_E.A.

Once l 've chosen the theme entries, Icopy them into a blank grid of the appro-priate size. Then I add the black squares.This is an eminent ly computer izable func-t ion, and one that I hope to get to soon.

Next , I break the gr id in to p ieces,s ince the program is not fast enough to

f i l l an ent i re puzzle at once. The s ize ofthese sections has gotten Iarger and larg-er over the years. This months Ornery,for example, was constructed f rom oniy'l 1 pieces.

Then my program does its thing witheach g r i d p iece . I can spec i f y l o t s o frequirements l ike "this square must be avowe l " o r "don ' t use th i s wo rd , " bu tmostly I just let it run. The program willusual ly d isp lay the f i rs t f i l l in a few sec-onds, but the t ime i t takes to f ind thebes t f i l l va r i es cons ide rab l y . On a 33 -megaher t z IBM-PC 486 c lone w i th 16megabytes of memory, the program canexhaustively examine all possible ways tof i l l the 36 non-theme entr ies in the maiord iagonal oI a 21 x 21 gr id, and choosethe best one, in 5 to 12 hours.

Th i s may no t s t r i ke you as supe r -speedy and, in fact , GAMES's own MikeShenk could probably do the same th ingin 20 minutes. There are three points toc o n s i d e r , t h o u g h . F i r s t , l ' m n o t M i k eShenk. L ike much automat ion, my pro-gram puts the skil ls of an expert into thehands of a novice Second, given the cur-rent t rends in computer hardware, I amqui te conf ident that several years f romnow the program wi l l a lso be able to f in-ish th is f i l l rn 20 minutes, Thi rd, not evenMike Shenk can f i l l g r i ds i n h i s s l eep . IL O t t .

Over t he yea rs , some c rosswordexperts have speculated that computer-f i l led grrds would be l i fe less and mechan-i ca l . I d i sag reed , po in t i ng ou t t ha t myt a s t e r n w o r d s w a s e x q u i s i t e l v w e l l -defrned by the hundreds of thousinds ofr a n k i n g d e c i s i o n s c o n t a i n e d i n m yda tabase . The compu te r f i l l s g r i ds t heway that I would i f I had the pat ience(and the longevity) to examine every pos-s ib i l i t y l r on i ca l l y , t he resu l t s a re more"Er ic Alber t - ish" than I could ever comeup wi th on my own.

When the computer has f i l led a l l o fthe sect ions, I c lue the puzzle. By hand.Someday a computer may be able to tosso f f h i p , punny , human c lues . Bu t no ttoday, and not tomorrow. For the fore-seeable future, Henry Hook wi l l remainin imi table. And that , as Henry would puti t , is probably lust as wel l .

an qui ts job to ra ise chi ld andconstruct crosswords by com-puter . " l t cer ta in ly sounds l ike

a great human-interest story. But what'st h e b o t t o m l i n e ? C a n s u c h a c r a z yscheme succeed?

So far, l 'm happy to say, I have soldevery computer-generated crossword Ih a v e c o n s t r u c t e d . a n d t o t h e m o s t -respected and h ighest-paying markets.Each mon th b r i ngs more reques ts f o rmore puzzles. And solvers seem pleasedwrth my work.

Wha t does the fu tu re ho ld? We l l ,you can be t t ha t i n two weeks l ' l l bestar t ing work on an Ornery Crosswordca l l ed "C razy N ines . " And , i n t he l ongrun, I believe computers wil l play an everbigger role in the crossword world.

Official Publications, a major publish-er of newsstand puzzle magazines, hasautomated the major i ty of thei r opera-t ions, f rom gr id checking through type-set t ing and page layout . They can nowput some issues together in hours insteadof days, at a considerably reduced cost.

Newspaper Enterpr ise Assocrat ion,one of the nationS largest feature syndi-cates, distributes a computer-oeneratedcrossword to hundreds of newsoaoersevery day. For cleverness, their puzzle wil lnever set the world on fire, but the com-p a n y h a s h a d f e w c o m p l a i n t s f r o msolvers. And NEA's approach of creating adatabase of stock clues for the comouterto choose from has already been emulat-ed by at least one large crossword maga-zrne company.

More constructors are a lso get t ingon board. Some now use Mel Rosen's pCprogram. Others may be interested inCCS ("Crossword Construction Set") forthe Macintosh, a recent offerinq bv BrranSheppard. lt has many of the f6atures ofRosen's product and can use a 250,000-entry database to fi l l grid sections of upto 25 words. Because this database is notranked fo r qua l i t y , t he f i l l r ng p rocessrequires the user to make many interac-t i v e d e c i s i o n s . S t i l l , C C S s i g n i f i c a n t l yreduces a constructor5 burden.

Wi l l computers replace puzzle peo-ple? Yes and no. The workaday construc-tors, those who gr ind out gr ids packedwith crosswordese, are doomed. My pro-gram can a l ready f i l l a gr id more quick lyand more sk i l l fu l ly than they can, and forpennies a puzzle. Competit ive forces wil leventually force the majority of the news-stand publishers and syndicates to followt h e l e a d o f O f f i c i a l P u b l i c a t i o n s a n dNewspaper Enterprise Association.

But in the upper echelon, where thee m p h a s i s i s o n a t r i c k y t h e m e a n d aclever c lue, machines wi l l have a muchsmal ler impact . Creat iv i ty wi l l keep c lassyconstructors ahead of the computer foryears to come. Mike Shenk, Mer l Reagle,Henry Hook: Keep your day jobs. r

Eric Albert is a frequent contributor toGAMES.

PROGRAM NOTESt Mel Rosen's Crossword Puzzler costs$150 and is available f rom Rosen, 1 1718Nick laus Circ le, Tampa, FL 33624. Ademo disk can be purchased for gl0.o CCS costs $495 and is available fromAlan Richter, 340 Riverside Drive, Apart-ment3-D, NeW?ork, NY 10025.

cAMEs FEBRUAny 1992 l 3