computational journalism at columbia, fall 2013, lecture 7: knowledge representation

Upload: jonathan-stray

Post on 14-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    1/39

    Fron%ersof

    Computa%onalJournalism

    ColumbiaJournalismSchool

    Week7:KnowledgeRepresenta%onOctober23,2013

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    2/39

    Lecture7:KnowledgeRepresenta%on

    StoryMetadata

    LinkedOpenData

    KnowledgeasRela%ons

    Automa%cstorywri%ng

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    3/39

    Unstructureddata

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    4/39

    Structureddata

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    5/39

    Ar%cleMetadata

    headline

    photo

    photocap%on

    byline

    photocredit

    publica%ondate

    dateline

    ar%clebody

    relatedar%cles

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    6/39

    Schema.orgnewsmarkup

    Overalltypeoftheobjectonthispage,inHTMLhead

    Headline,dateline,dateasaddi%onstodiv/spanproper%es

    Bylineexpressedasnestedobject(usingitemscope)oftypeschema.org/Person

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    7/39

    Drivingapplica%on:richsnippets

    Schema.orgcoversnotjustnewsbutmusic,restaurants,

    people,organiza%ons,reviews,offers...

    Snippets,andbe[ersearch-abilitygenerally,aremo%va%on

    forGoogle,Yahoo,Bingtopushschema.org

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    8/39

    Addi%onalmetadatafromindexingteam

    Indatabase,butdoesn'tnecessarilymakeittoHTML.

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    9/39

    Newsapplica%on:contentnaviga%on

    Ar%clesaboutSyria

    onNYTtopicpage

    Morereliablethansimpletext

    search(becausetherelevance

    algorithmknowsastoryis

    "about"Syria.)

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    10/39

    Lecture7:KnowledgeRepresenta%on

    StoryMetadata

    LinkedOpenData

    KnowledgeasRela%ons

    Automa%cstorywri%ng

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    11/39

    OntologiesWhatobjectsandrela%onsareavailable?

    Oenrepresentedasclasshierarchy.

    Arrows=is_arela%on

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    12/39

    (Partof)arealontology,fromCyc

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    13/39

    Everybignewsorghastheirown

    bigontologyL

    topics,people,organiza%ons,places...

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    14/39

    YaaayLinkedData!

    Triplesof(subjectrela%onobject),eachaURLorliteral

    !

    !

    "NY!

    !

    !

    !

    Abbrevia%onspossiblewithmanyformats...!rdf:type!ns6:CollegeOrUniversity!

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    15/39

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    16/39

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    17/39

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    18/39

    NYTontologyavailableasLOD

    owl:SameAsmakesthisinteroperable

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    19/39

    NYTAPIcanreturnlinkeddata

    {!

    !"title": "Syria's Rebels Open Talks on Forging United Political

    Front"!

    !"body": "BEIRUT, Lebanon Syria s fractious opposition groups

    began negotiations in Doha, Qatar, on Sunday to forge a more unified

    front to reshape the political landscape in a bloody conflict that

    claims more than 100 lives virtually every day. Given the scant

    prospects that any attempt to restructure the opposition will succeed the",!

    !"dbpedia_resource_url": [!

    "http://dbpedia.org/resource/Hillary_Rodham_Clinton",!

    "http://dbpedia.org/resource/Bashar_al-Assad"],!

    !"facet_terms": "CLINTON, HILLARY RODHAM ASSAD, BASHAR AL- SYRIA

    DOHA (QATAR) SYRIAN NATIONAL COUNCIL STATE DEPARTMENT WAR ANDREVOLUTION DEFENSE AND MILITARY FORCES"!

    }!

    !

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    20/39

    Lecture7:KnowledgeRepresenta%on

    StoryMetadata

    LinkedOpenData

    KnowledgeasRela%ons

    Automa%cstorywri%ng

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    21/39

    Objectsandrela%onsintext?

    names,dates,places,verbs.

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    22/39

    NamedEn%tyRecogni%on

    Extractsubjects,objects,fromtext.

    Also,resolvepronounsifpossible.

    "Gov.AndrewM.CuomoonWednesdaygavea

    seawallthenod.Becauseoftherecenthistory

    ofpowerfulstormshingthearea,hesaid,

    electedofficialshavearesponsibilitytoconsider

    newandinnova%veplanstopreventsimilar

    damageinthefuture."

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    23/39

    NERstateoftheart

    Commercial:ReutersOpenCalais Academic:StanfordNERlibrary

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    24/39

    Nextlevelofunderstanding:verbs

    ThewaterthatmaderiversofAvenuesCandDrecededonTuesday,andtheEastVillagewasa

    mixtureofdisasterandnonchalance.Agroupof

    youngmeninpajamapantsandshortsthrewafootballonEast12thStreet,whileworkers

    pumpedthebasementofCHPHardwareon

    AvenueCandEighthStreet.

    subjectverbobject

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    25/39

    KnowledgeRepresenta%oninAI

    (acrazybriefintroduc%on)

    Classic"symbolic"paradigmrepresents

    knowledgeasstatementsinmathema%callogic.

    Manyvaria%ons.Mostaresubsetsor

    modifica%onsofstandardfirstorderlogic(FOL).

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    26/39

    PredicatesandRela%ons

    Predicate:assertsthatobjectbelongstoaclass

    vechicle(schoolbus)!

    bird(tweety)!

    straight_gangsta(emily_bell)!

    Rela%on:assertsrela%onshipbetweenobjects

    is_a(car, vehicle)!

    higher_rank(general, colonel)!

    capital(paris, france)!

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    27/39

    Inference

    Generalrules

    a (a => b) => b!

    p !p!

    Domainspecificinferences

    is_a(car, vehicle)!

    can_move(vehicle)!

    => can_move(car)!

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    28/39

    Newsasrela%onsbetweenen%%es

    Alicea[endedthewedding!attended(alice, wedding)!

    !

    IBMwasfoundedin1917.!founded(IBM, 1917)!

    !

    HurricaneSandyhitNewYork

    !hit(hurricane_sandy, New_York)

    !

    !

    !

    Encodefactsasrelation(subject,object)!

    alsowri[en(subject relation object)!

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    29/39

    Thingswecoulddowiththis

    Ques%onansweringThegranddaughterofwhichactorstarredinE.T.?

    (?x acted-in E.T.)(?y is-a actor)(?x granddaughter-of ?y)!

    Inference!(bob brother-of alice)!

    !(alice mother-of lucy) =>!

    ! !(bob uncle-of lucy)!

    Answerques%onsusinginference

    howmanyexecu%vesofpublicly-tradedCanadiancompaniesdiedincarcrashes?

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    30/39

    Problems

    Notallsubjectsaresimple.Overahundredguestsa[endedthewedding

    !attended(num_guests, wedding) ! ! ! !!!greater_than(num_guests,100)!

    !

    Somerela%onshavemul%pleparts.

    !

    HurricaneSandyhitNewYorkonMonday!hit(sandy, New_York, monday)!

    !

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    31/39

    Standardinferencedoesntallowdefaults

    Allbirdsfly!bird(tweety) ! ! ! !!

    !bird(?x) => flies(?x)!

    => flies(tweety)!

    !

    But,penguinsandchickensdontflybird(?x) & !penguin(?x) & !chicken(?x)=> flies(?x)!

    !

    Nowwecantguessthattweetyfliesbird(tweety)!=> flies(tweety) ?!

    we dont know!!

    St d d th % l l i d t

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    32/39

    Standardmathema%callogicdoesnt

    dealwellwithexcep%ons

    Somepeopledonthavealastname.

    !

    Some%mesanelec%onisntdecidedonelec%onday.

    Isatrashcanusedasaflowerpots%llatrashcan?

    Isabrokencars%llavehicleifitcan'tmove?

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    33/39

    Rela%onsfromsentenceparsing

    ThewaterthatmaderiversofAvenuesCandDrecededonTuesday,andtheEastVillagewasa

    mixtureofdisasterandnonchalance.Agroupof

    youngmeninpajamapantsandshortsthrewa

    footballonEast12thStreet,whileworkers

    pumpedthebasementofCHPHardwareon

    AvenueCandEighthStreet.

    subjectverbobject

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    34/39

    Rela%onextrac%onsystems

    Commercial:IBM'sDeepQA(Watson) Academic:Reverbalgorithm

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    35/39

    Ontologyexplosions

    (watermaderiversofAvenuesCandD)

    (EastVillagewasamixtureofdisasterandnonchalance)

    (groupofyoungmeninpajamapantsandshortsthrewfootball)

    (workerspumpedthebasementofCHPHardware)

    Dowehavealloftheseintheontology?

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    36/39

    GeneralQues%onAnswering

    Precision/recalltradeoff.StateoftheartisIBMsDeepQA

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    37/39

    DeepQAuseofstructureddata

    Watsoncanalsousedetectedrela%onstoqueryatriplestoreanddirectlygeneratecandidateanswers.

    Duetothebreadthofrela%onsintheJeopardydomain

    andthevarietyofwaysinwhichtheyareexpressed,

    however,Watsonscurrentabilitytoeffec%velyusecurateddatabasestosimplylookuptheanswersis

    limitedtofewerthan2percentoftheclues.

    -Ferruciet.al.BuildingWatson

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    38/39

    Lecture7:KnowledgeRepresenta%on

    StoryMetadata

    LinkedOpenData

    KnowledgeasRela%ons

    Automa%cstorywri%ng

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 7: Knowledge Representation

    39/39

    WallStreetishighonMolsonCoorsBrewing(TAP),expec%ngittoreport

    earningsthatareup17.5fromayearagowhenitreportsitsthirdquarter

    earningsonWednesday,November7,2012.Theconsensuses%mateis$1.34

    pershare,upfromearningsof$1.14pershareayearago.

    Theconsensuses%matehasdippedoverthepastmonth,from$1.35,butits

    s%llupfromtheconsensuses%mateof$1.19threemonthsago.Forthefiscal

    year,analystsareexpec%ngearningsof$3.89pershare.Revenueisprojected

    toeclipsetheyear-earliertotalof$954.4millionby31,finishingat$1.25billionforthequarter.Fortheyear,revenueisprojectedtorollinat$4.04

    billion.

    Thecompanysnetincomehasdeclinedinthelasttwoquarters.The

    companypostedprofitfallingby52.8inthesecondquarter.Thisisaeritreportedaprofitdeclineinthefirstquarterby4.1.