Document-levelTextQuality:ModelsforOrganizationandReaderInterest
AnnieLouisOctober16,2015
Université Catholique deLouvain
JointworkwithAni Nenkova
Peoplespontaneouslyrespondtodifferencesinwriting
2
3
“Finnegans Wake islong,dense,andlinguisticallyknotty,yethugelyrewarding,ifyou'rewillingtolearnhowtoreadit...”
http://www.publishersweekly.com
“MyFaith:WhyIdon'tsingthe'StarSpangledBanner’”
“What a poorly written article. Strays off topic and hardly even addresses the point of the article.
The only brief mention of why they don’t play the national anthem is that they believe in church and state. This just was one long rant about his religion.”
4
http://www.cnn.com
5
http://www.vocabula.com
TextQualityPrediction
Thisarticleiswell-written.Nextone..
Canweteachcomputerstomakesimilarjudgements?
6
o Howtoformulatethetask?
o Getsuitabledatawithdistinctions
o Findcorrelatesintext
Whydowecare?• Informationretrieval,articlerecommendation
– Allarticlesarenotofthesamequality– Canfilterbyqualityinadditiontorelevance
• Authoringsupport,educationalassessment– Automaticassessmentischeap,consistentandquick– Spellingandgrammarcorrectionarecommerciallysuccessful
• Textgenerationsystems– Systemscanunderstandhowtogeneratecoherenttext– Canevaluatesystemoutput
7
Thistalk
• Definingtextqualityandcreatingacorpusofoverallarticleratings– Largescalerealisticsampleofwritingdifferences
• Twomodels– Amodelfororganizationusingsyntaxpatterns– Amodelforreaderinterest
• Document-levelqualityprediction– Incontrasttospellingandgrammar– Oftennotabinary,correct/in-correctdistinction
8
>>DefiningTextQuality
9
• Aspectsofquality• Whoistheaudience?
Aspectsofquality
• Weadoptadefinitionfromtheeducationfield
10
Ideasanddevelopment
Organization(Smooth
transitions)
Voice(Personaltouch)
Wordchoice(vivid, lively)
Sentencefluency(Rhythm)
Conventions(Mechanics)
SixTraits[Spandel 2004]Detailsandtheirpresentation
Flowbetweensentences
Interestingnature,beautifulwriting
Spelling,grammar,layout
11
Audiencefortextquality– Anexpert
lowcompetency highcompetency
ExperiencedNLPresearcher
Readerofmachine-generatedtext
Adult readerofnewspaper
• Increasedfocusonlinguisticpropertiesofthetext
12
Relationshiptoreadability• Readabilityhasastrongfocusoncomprehension
Gradelevel1
Gradelevel2..
Gradelevel12
13
• Audiencedistinctions– childvs.adult,novicevs.expert,cognitivedisabilityornot
>>ACorpusforDocument-levelQuality
14
Louis&Nenkova,DiscourseandDialogue,2013
Sciencejournalism:examplesnippet
Sarah Lewis is fluent in firefly.
On this night she walks through a farm field in eastern Massachusetts, watching the first fireflies of the evening rise into the air and begin to blink on and off.
Dr. Lewis, an evolutionary ecologist at Tufts University...
15
Category1:VERYGOODarticles• Seedset=63NewYorkTimesarticlesthatappearedintheBestAmericanScienceWritingseries
• WechooseonlytheNYTarticles– WeusetheNYTCorpustoexpandourcategory– Normalizefordifferencesinwritingduetosource
16
Topicsintheseedset
17
Tag AppearanceMedicineandHealth 22
Space 14Physics 10Biology andBiochemistry 8
GeneticsandHeredity 8
Archaeology andAnthropology 7
…
ComputersandtheInternet 4
ExpandingtheVERYGOODset
• Assume:~40authorsoftheseedsetareexcellentwriters
• OtherarticlesfromtheNYTwrittenbythesameauthors– whichareresearchrelated– duringthesame10yearperiod– onsimilartopics– similarlengths
18
Category2:TYPICALwritingintheNYT
• Othersciencearticlesaroundthesametime,butnotwrittenbythepopularauthors
Category TotalArticles
VERY GOOD 3,530
TYPICAL 20,242
Thegeneralcorpus:
19
Atopic-pairedcorpus
• Thegeneralcategoriesmixdifferenttopics– geography,biology,astronomy,linguistics…
• ButanIRsystemcomparesarticlesonthesametopic
• ForeachVERYGOODarticle,get10mostsimilarTYPICALarticles(basedonthecontent)
• Enumerateallpairsof(VERYGOOD,TYPICAL)
• 35,300pairs
20
Twoqualitypredictiontasks
`Same-topic’– whicharticleinthepairistheVERYGOODone?
2categoriesGOOD(~3500)TYPICAL(~3500)
Topicallysimilarpairs<VERYGOOD,TYPICAL>
~35,000pairs
`Any-topic’– isthisarticleVERYGOODorTYPICAL?
21
Propertiesofthedataset
• Distinguishesaveragewritingfromverygood
• Allowtofocusonaspectssuchasbeautifulwriting– Lesslikelytohavespellingandgrammarerrors
• Largescaleandrealisticsampleofwritingdifferences– Previousworkoftenusedmachinegeneratedtextorartificiallymanipulatedtext
22
>>Predictingorganizationquality
23
Louis&Nenkova,EMNLP2012
Somesequencesofsentencetypesconveytheoverallpurposebetter
24
Solving X is useful for many applications.
We present a new approach to address X.
Results show that our method works well.
Motivation
Introduceapproach
Results
Intentionalstructureofanarticle• Everytexthasapurposethattheauthorwishestoconvey
• Influentialearlytheoriesdiscussitatlength
[Grosz&Sidner 1986]
• Particularlyforacademicwriting,itispopulartoseearticlesasasequenceofintentions
[Swales1990,Teufel 2000]
Narrative
Explanation Critique
25
Oraclemodelofintentionalstructure• UsingmanualannotationsofintentionsonACLarticles
26
STARTBackground
AimOwn
Contrast
TextualEND0.7
0.1
0.3
MarkovChainonIntroductionsections
[corpusbyTeufel,2000]
0.8
0.4
Otherswork0.1
Mainideaofthework• Annotatingsentencetypesishard.Pre-definingthesetof
sentencetypesisharder
• Assume
27
Syntax~roughproxyforsentencetype
28
Syntacticpatternsinexplanations
• An aqueduct is awatersupplyornavigablechannelconstructedtoconveywater.Inmodernengineering, thetermisusedforanysystemofpipes,canals,tunnels,andotherstructuresusedforthispurpose.
• A cytokinereceptoris areceptor thatbindscytokines.Inrecentyears, thecytokinereceptorshavecometodemandmoreattentionbecausetheirdeficiencyhasnowbeendirectlylinkedtocertaindebilitatingimmunodeficiencystates.
Definitionslooklikethis
Descriptivearticleslooklike
this
indefinitearticletermtodefine
Relativeclause
is/are NPMorespecific:topicalizedPP
Syntax-basedHMMmodel
START END
0.5
0.3
0.2
0.3
0.2
VPà VBZNP
NPà DTADJP
NPà NPPP
….
“Definitions”
NPà NNPCCNNP
NPà CD
NPà NP,NP
…
“Citations”
VPàMDVP
VPà VBVP
VPà VBPP
…
“Speculations”
29*Usesgrammaticalproductions
30
• Moreinformationaboutadjacentconstituents• APOStagsequencelosesallabstraction
• D-sequence– controlabstractionusingaparameter“depth”(d)
Asecondmodel:basedond-sequences
S
NP”,S“ VP .
NP VP
DT VBZ NP
NN
NNPNNP VBD
JJ
[“DTVBZJJNN,”NNPNNPVBD.]
31
Step1– depthcutoffROOT
S
NP”,S“ VP .
NP VP
DT VBZ NP
JJ NN
NNPNNP VBD
Chooseadepthd
Terminatetreeatd
Readoffnewleavesfrom lefttoright
d=2
“S,”NPVP.
d=3“NPVP,”NNPNNPVBD.
“That’sgoodnews,”Dr.Leaksaid.
32
Step2:NodeaugmentationROOT
S
NP”,S“ VP .
NP VP
DT VBZ NP
JJ NN
NNPNNP VBD
Forphrasalnodes ind-sequence,
- Annotatewithleftmostleafinfulltree
d=2
“ SDT , ” NPNNP VPVBD .
d=3
“ NPDTVPVBZ ,” NNPNNPVBD.
DT NNP VBD
DT VBZ
Evaluationtaskonacademicwriting
• ACLanthologycorpus– abstract,introduction,relatedwork
• Approximatedistinction fororganizationquality– Originalarticleà well-organized– Randompermutationoforiginalà poorly-organized– Createpairs<original,permutation?
• Task:identifytheoriginalversioninthepair– Baseline50%accuracy
33
Summaryofresultsonacademicwriting
• Correct=higherlikelihoodfororiginalarticle– versuspermutedarticle
• D-seq model
34
ACLconference Accuracy
Abstract 62.9
Introduction 68.8
Relatedwork 72.7
Baseline=50%
DosentencetypesdistinguishVERYGOODandTYPICALsciencenews?
• CreatetheHMMonVERYGOODtrainingarticles
• Getlikelihoodandmostlikelystatesequenceforanewarticle– Computefeaturesbasedonthese
• AclassifieristrainedtopredicttheVERYGOODarticle
35
Resultsonourcorpus
36
AnyTopic:Givenanarticle,isit“VERYGOOD”or“TYPICAL”?
System Accuracy
Baseline(random) 50%
HMM-productions 61%
§ 10foldcrossvalidationresults§ SVMclassifier
SameTopic:Givenapairofarticlesonthesametopic,whichoneis“VERYGOOD”?
System Accuracy
Baseline(random) 50%
HMM-productions 63%
>>Predictingreaderinterest
37
Louis&Nenkova,TACL2013
Predictinginterest:Anewtask
• Alotofworkonidentifyingwhatiswrongwithatext– Spellingmistakes,grammarerrors,incoherentwriting
• Itisnotknownhowtocharacterizewritingthatisengaging,interestingandnice
38
Approachtofeaturedevelopment
• Focusoninterpretablefeatures– Only41features– Eachfeatureisacompositeone:indicatesanaspectdirectly– Linguisticallyinteresting
• Confirmthatfeaturesrepresenttheintendedaspect– Tunebycheckingfeaturevaluesonrandomsnippetsoftext
39
1.UnusualwordsandphrasesIsthephrasingandlanguageuseunique?
• Word-based– highperplexityunderaphonemen-grammodel– Eg:‘undersheriff’,‘powwow’,‘chihuahua’,‘qipao’
• Wordpairs--based– adjective-noun,noun-noun,adverb-verb,subject-verbpairs– perplexityunderalanguagemodel– Eg:‘plastickywoman’,‘so-calledsuperkids’
40
2.VisualnatureIstherescenesetting?
• Creatingalargelexiconofvisualterms– Source:animage-taggedcorpus– Largesourceofpotentiallyvisualwords,butnoisy
• CreateLDA-basedtopicsonthetagset– UsethemanualMRCtermstofilteroutnon-visualtopics
41
grass,mountain,green,hill,blue,field,sand...round,ball,circles,logo,dots,square,sphere...silver,white,diamond,gold,necklace,chain...
Humaninterestandtextstructure
3. UseofpeopleinthestoryDoesthestoryrevolvearoundaperson?
– animacy informationfromNEs,pronouns,ngram patterns
4. Sub-genreIsthearticleisanarrative,interviewordialog
– Eg:narrativescore~pasttenseverbs,pronouns,propernames
42
SentimentandResearch
5. AffectIsthereanemotionalangletothestory?
– usingsentimentworddictionaries
6. ResearchcontentHowmuchexplicitresearchdescriptionispresent?
– usingahand-builtdictionaryofresearchwords
43
Howthefeaturesvaryinarandomsampleofverygoodandtypicalarticles(t-test)
Higher valuesinVERYGOODset
ü Visualwordsinbeginningandendofarticles
☓ Totalvisual words
☓Animacy countsü Unusualwordsandphrases
☓Narrative,interviewordialogformat
ü Sentimentwords,negativepolarity
ü Researchwords44
AccuraciesonthetwotasksAnyTopic:Givenanarticle,isit“VERYGOOD”or“TYPICAL”?
System Accuracy
Baseline(random) 50%
Interesting-sciencefeatures
75%
§ 10foldcrossvalidationresults§ SVMclassifier
SameTopic:Givenapairofarticlesonthesametopic,whichoneis“VERYGOOD”?
System Accuracy
Baseline(random) 50%
Interesting-sciencefeatures
68%
45
Combininginterestwithotheraspects
46
Featureset anytopic sametopic
Interesting science 75.3 68.0
PreviousmethodsforpredictingotheraspectsReadable(article length,language
model, cohesion,syntax)16features
65.5 63.0
Well-written(entitygrid[BL08],discourserelations[PN08])
23features
59.1 59.9
Interesting-fiction[ML09]22features
67.9 62.8
Combination ofallfeaturesAllwriting aspects 76.7 74.7
Differentaspectsofwritinghave
complementarystrengths
Genre-specificmeasuresarestrongerthangenericones
Conclusions
• Textqualityisaninterestingandchallengingtask
• Moresuccessonthetopicrecently– applicationtonovels,tweets,essays
• Futurework– Alottobedoneintermsofformalizingthetasks,collectingdata,modelsandevaluation
– Transferringtheknowledgetogeneratingtexts
47
Thankyou!
48