provided by the author(s) and university college dublin ... · welcome to the 3rd european data and...

28
Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title Proceedings of the 3rd European Data and Computational Journalism Conference Publication date 2019-07-01 Conference details The 3rd European Data and Computational Journalism Conference, Malaga, Spain, 1 - 2 July 2019 Publisher University College Dublin Link to online version https://www.datajconf.com/2019/ Item record/more information http://hdl.handle.net/10197/11426 Downloaded 2020-10-02T15:28:01Z The UCD community has made this article openly available. Please share how this access benefits you. Your story matters! (@ucd_oa) Some rights reserved. For more information, please see the item record link above.

Upload: others

Post on 27-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

Provided by the author(s) and University College Dublin Library in accordance with publisher

policies. Please cite the published version when available.

Title Proceedings of the 3rd European Data and Computational Journalism Conference

Publication date 2019-07-01

Conference details The 3rd European Data and Computational Journalism Conference, Malaga, Spain, 1 - 2

July 2019

Publisher University College Dublin

Link to online version https://www.datajconf.com/2019/

Item record/more information http://hdl.handle.net/10197/11426

Downloaded 2020-10-02T15:28:01Z

The UCD community has made this article openly available. Please share how this access

benefits you. Your story matters! (@ucd_oa)

Some rights reserved. For more information, please see the item record link above.

Page 2: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism
Page 3: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

Editors: BaharehHeravi,MartinChorley,GlynMottershead

Copyright:TheAuthorsofthepapersinthecollection.

ISBN:978-1910963388

Page 4: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

Welcometothe3rdEuropeanDataandComputationalJournalismConference!

The3rdEuropeanDataandComputationalJournalismConferenceaimstobringtogetherindustry,practitionersandacademicsinthefieldsofjournalismandnewsproductionandinformation,data,socialandcomputersciences,facilitatingamultidisciplinarydiscussiononthesetopicsinordertoadvanceresearchandpracticeinthebroadareaofDataandComputationalJournalism.HeldinMalaga,Spain,theconferencepresentedamixofacademictalksandkeynotesfromindustryleaders.Itwasfollowedbyadayofworkshopsandtutorials.Submissionsofbothacademicresearch-focusedandindustry-focusedtalksfortheconference,onthesubjectsofjournalism,datajournalism,andinformation,data,socialandcomputerscienceswereinvitedfortheconference.Topicsofinterestinclude,butarenotlimitedto:

• Applicationofdataandcomputationaljournalismwithinnewsrooms• Datadriveninvestigations• Datastorytelling• Opendataforjournalism,storytelling,transparencyandaccountability• Algorithms,transparencyandaccountability• Automated,robotandchatbotjournalism• Newsroomsoftwareandtools• ‘Post-fact’journalismandtheimpactofdata• Userexperienceandinteractivity• DataandComputationalJournalismeducation• Post-desktopnewsprovision/interaction• Dataminingnewssources• Visualisationandpresentation• NewsgamesandgamificationofNews• Bias,ethics,transparencyandtruthinDataJournalism• Newsroomchallengeswithrespecttodatajournalism,bestpractices,successand

failurestories

Collectedwithintheseproceedingsaretheacademicabstractspresentedattheconference.Wewouldliketotakethisopportunitytothanktheprogrammecommitteefortheirhardworkreviewingsubmissionsandhelpingustocomeupwiththefantasticline-upoftalksforthisyear.AndanenormousthankyoutotheorganisingcommitteeattheUniversityofMalagaforthebeingsuchexcellenthosts.WelcometoMalaga,andwelcometoDataJConf2019!BaharehRHeravi,MartinJChorley&GlynMottersheadDataJConf2019co-chairs

Page 5: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

2

Title PageInvitedtalkDanieleGrasso,ElPais

WithoutthehumanelementyourdatastoriesarejustspreadsheetsMohammedHaddad-AlJazeera

Howdoyoucoveruncertainelections?JoshRayman&AliceGrenié-BBCWorldService

DetectingnewsworthyeventsinajournalisticplatformJTareqAl-Moslmi,MarcGallofréOcaña,AndreasLOpdahlandBjørnarTessem-UniversityofBergen

3

FakeNewsDetectionBasedonNamedEntityRecognitionandMachineLearningFranciscoLopezValverde,RafaelaBenitezRochelandMariaGuerreroAguilar-UniversityofMalaga

6

RODA:atoolforsemi-automaticdata-drivenvisualstoriesXaquínVeira-González,AntonBardera,AppleChan-FardelandMaríaLuisaOteroLópez-UniversityofGirona,UniversityofSantiagodeCompostela

9

BecomingaDataJournalist:theroleofidentityindatajournalismeducationLizabethHannaford-ManchesterMetropolitanUniversity

12

PredictivesentimentanalysisofmessagesforJournalisticPurposes:Real-timeclassificationoftweetsbasedonMachineLearningFélixOrtega,CarlosArcilaandAntonioGarcía-UniversityofSalamanca,UniversityReyJunCarlos

15

BuildingaStatsBotSophieWarnes,JureStabucandHenryLau-OfficeforNationalStatistics

Style,Singularity,andSubstance:WhatPictureEditorsWantfromA.I.MartinSchönandNeilThurman-LMUMunich

20

Candatajournalismreallystimulatelocalnews?AcasestudywithmediainthecountrysideofPortugalRicardoMoraisandPedroJerónimo-UniversityofBeiraInterior/Labcom.IFP

23

InvitedTalkMeredithBroussard,NewYorkUniversityForday2panelsandworkshoppleasevisitconferencewebsiteondatajconf.com.

Page 6: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

3

DetectingNewsworthyEventsinaJournalisticPlatform

TareqAl-Moslmi MarcGallofrẽ AndreasLOpdahlBjørnarTessemDept.InformationScienceandMediaStudies,UnivofBergen,N-5020Bergen,Norway

{Tareq.Al-Moslmi,Marc.Gallofre,Andreas.Opdahl,Bjornar.Tessem}@uib.no

Abstract:Socialandotheropenandproprietarydatasourcesarerapidlychangingthenatureofnewsandofjournalisticwork.IntheNewsAnglerproject,wewanttoharnesssuchbig-datasourcesforjournalisticpurposes.Weproposeaplatform,NewsHunter,thatisabletosuggestappropriatenewsanglesonunfoldingeventstojournalists.Preciselyassessingthenewsworthinessoftheseeventsisimportanttoavoidalertfatigue.Newsanglesareseenaspatternsthatcanbematchedbynewseventsrepresentedintheknowledgegraph.Workontheplatformsofarsuggeststhatnewsworthinesscanbeestimatedasaninterplayofatleastthreefactors:reliability–thattheeventiscorroboratedbymultipleindependentand/ortrustedsources;match–thattheeventfitsanewsanglethatisalignedwiththeintendedaudienceandnewsroomprofile;andnovelty–thattheeventhasnotbeenreportedwidelyfromthisanglealready.Keywords:Journalisticplatforms,newsroomsystems,knowledgegraphs,bigdata,newsangles,newsvalues,newsworthiness.

IntroductionTheNewsAnglerprojectaimstoharnesssocialandotheropenandproprietarydatasourcesforjournalisticpurposes.Specifically,wewanttoleveragetheconceptofnewsanglestohelpjournalistseffectivelyidentifynewseventsandnarratenewsstoriesthatmayinteresttheiraudience.Examplesofanglesareconflict,localperson,andfallfromgrace.Someanglesaremoredetailedversionsofothers,suchasDavid-versus-Goliath,asubtypeoftheconflictangle.Incollaborationwithadeveloperofnewsroomsystemsfortheinternationalmarket,wearedevelopingaplatform,NewsHunter,thatisabletoharvestpotentiallynews-relevanttextitemsfromtheweb,analysethemsemantically,ingestthemintoaknowledgegraph,aggregateitemsinthegraphintopotentiallynewsworthyevents,andsuggestsuitablenewsanglesonunfoldingeventstojournalists(Bervenetal.2018,Gallofréetal.2018).Newsanglesarepatternsthatcanmatchandmakeinterestingeventsinthisknowledgegraph.Withever-increasinginformation,preciselyassessingnewsworthinessofunfoldingeventsisessentialtopreventjournalisticalertfatigue.Thereisalreadyabroadvarietyofnews-relevantinformationplatformsavailable(Diakopoulos2016).TheyrangefromgeneralnewsservicessuchasGoogleandYahooNews,throughgeneralinformationplatformssuchasEMM,OCCRPandWebLyzard,tonews-specificonessuchasBloomberg’sknowledgegraph(Voskarides2018),EventRegistry(Leban2014)andReutersTracer(Liu2017).Manyofthemalreadyuseknowledgegraphsandrelatedsemantictechnologies,butwearenotawareofexistingapproachesthataimtosupportnewsanglesandusethemtoassessnewsworthiness.OurworkontheNewsHunterplatform(Bervenetal.2018,Gallofréetal.2018)suggeststhatnewsworthinesscanbeestimatedasaninterplayofatleastthreefactors:reliability–thattheeventiscorroboratedbymultipleindependentand/ortrustedsources;match–thattheeventfitsanewsanglethatisalignedwiththeintendedaudienceandnewsroomprofile;andnovelty–thattheeventhasnotbeenreportedwidelyfromthisanglealready.

Page 7: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

4

MethodsWeaimtounderstandnewsplatforms,newsangles,andnewsworthinessthroughdesignresearch(Hevner2007),developingaseriesofprototypesbasedonstate-of-the-artbigdataandknowledgegraphtechnologies.Practicalrelevanceisensuredbyourindustrialpartner,whosharestheirunderstandingofindustrialandjournalisticneedsandwishes.Theoreticalrelevanceisensuredbyfocussingonopenresearchissuessuchashownewsplatformscansupportnewsanglesandhowknowledge-grapharchitecturescanscaletobig-datasettings.

FindingsandArgumentOurworkontheNewsHunterplatformsofarsuggeststhatnewsworthinesscanbeestimatedasaninterplayofatleastthreefactors,whichwenowdiscussinmoredepthtoestablishrequirementsforanewsplatformthatsupportsangles.

Figure1-OverviewoftheNewsHunterplatform(fromBervenetal.2018).Reliability:Mostimportantly,inordertobenewsworthy,aneventmustbereliable.Ifthesourceishighlytrusted,theeventmaybenewsworthyevenifitisreportedonlybyasingleitem.Butinmostcases,theeventmustbecorroboratedbyitemsoriginatingfrommultiplesources.Reliabilityoftheeventisinfluencedbothbythereliabilityof(ortrustin)thosesourcesandoftheindependenceoftheitems.Forexample,twotweetsmaybebasedonthesameunderlyingsourceoronemaysimplybearetweetoftheother.Finally,eventreliabilityisalsoinfluencedbyhowreliabletheliftingoftextualitemsintosemanticitemgraphsandtheaggregationofthoseitemgraphsintoeventgraphswere.Tosupportcorroboration,thenewsplatformmustthereforebeabletotraceeventsbacktotheiroriginatingitemsandthoseitems’sources.Trustinsourcesmustbeestimatedandmaintained,aswellastrustintheliftingandaggregationstepsonthewayfromtextualitemstoeventgraphs.Totheextentpossible,theexternalsourcesofitemsshouldalsobeidentified–atleastitemsthatarebaseddirectlyononeanotheroronacommonprecursorneedtobeidentified.Match:Inordertobenewsworthy,theeventmustmatchanewsangle,whichisapatternformouldinganevent,ifpossible,intoafabula,whichisasub-graphoffactsabouttheeventthatcanbenarratedtobecomeastory.Itisimportantthatanglesdonotonlyfittheevent,butalsothenewsorganisation’sintendedaudienceandprofile.Tosupportanglematching,thenewsplatformmustthereforemaintainalibraryofangles,whethercreatedmanuallyorlearnedautomatically.Itmustbeawareofwhicheventsandanglesthatfittheaudienceandnewsroomprofile,anditmustbeabletomatchangleswitheventstomouldfabulas.Also,tomatchanewsangle,theeventgraphmustbesufficientlydetailed.Thisisanotherreasonforaggregatingnewsitemsintoeventgraphs,whichwillpresumedlynotonlybemorereliable,butalsomoredetailedandcompletethantheindividualitemgraphs.Thenewsplatformshouldinvitethejournalisttocollectfurtherfactswhenneededtocompleteapromisingangle.

Page 8: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

5

Novelty:Finally,inordertobenewsworthy,anangledeventshouldbeoriginal.Othernewsmediathattargetthesameaudienceshouldnotalreadyhavecoveredtheeventfromthesameangle.Tosupportnovelty,thenewsplatformmustthereforeharvestnewsitemspublishedbycompetingmediaorganisationsthattargetasimilaraudienceinrealtime.Itmustbeabletotracefromanewsitemtotheeventitdescribes,anditmustbeabletodetecttheanglefromwhichaneventisnarrated.

ConclusionsWehavepresentedtheobjectivesoftheNewsAnglerprojectandhowweplantoidentifynewsworthyeventsbyassessingtheirreliability,match,andoriginality.InfutureworkwewillvalidateourapproachbycontinuingtoextendtheNewsHunterplatformtosupportnewsworthinessandnewsangles.

ReferencesBerven,Arne,etal.(2018)“NewsHunter:Buildingandminingknowledgegraphsfornewsroomsystems.”Proc.NOKOBIT26,Svalbard.Diakopoulos,Nicholas(2016)"Computationaljournalismandtheemergenceofnewsplatforms."RoutledgeComp.Dig.JournalismStudies.GallofréOcana,Marc,etal.(2018)"TowardsaBigDataPlatformforNewsAngles⋆."Proc.NOBIDS’18,Trondheim.Hevner,AlanR.(2007)"Athreecycleviewofdesignscienceresearch."SJIS19.2:4.Leban,Gregor,etal.(2014)"Eventregistry:learningaboutworldeventsfromnews."Proc.23rdInt.Conf.WWW.Liu,Xiaomo,etal.(2017)"Reuterstracer:Towardautomatednewsproductionusinglargescalesocialmediadata."IEEEInt.Conf.BigData.Voskarides,Nikos,etal.(2018)"Weakly-supervisedContextualizationofKnowledgeGraphFacts."Proc.41stACMInt.SIGIRConf.

Page 9: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

6

FakeNewsDetectionBasedonNamedEntityRecognitionandMachineLearning

FranciscoL.ValverdeDept.ofComputerScienceUniversityofMalagavalverde@uma.es

RafaelaBenítezRochelDept.ofComputerScienceUniversityofMalagarbenitezr@uma.es

MaríaGuerreroAguilarJournalistofPressOfficeUniversityofMalagamariaguerrero@uma.es

Abstract:Falsenewshasbecomeaproblemofthefirstmagnitudeforthegovernmentsofnationsandthemedia.Duetothelargevolumeofinformationtoanalyzetosolvethisproblemitseemsthatthesolutionshouldbeanautomaticmethodthatmanagestodetectfalsenews.However,todaywestilldonothavethetechnologytohaveanautomaticandefficientsolution.Forthisreason,theonlysolutionsthatareworkingarebasedonmanualoperation.Ourproposalconsistsofadecisionsupportsystemfortheseorganizationstofacilitatetheirwork.UsingamachinelearningsystembasedonNamedEntityRecognitionandidentitiesoftheauthorsitispossibletomakeapriorclassificationofauthenticity.Thankstothisscheme,theamountofinformationthatisnecessarytoanalyzemanuallyisgreatlyreduced.

Keywords:FakeNewsDetection,NamedEntityRecognition,MachineLearning.SuperVectorMachines,Identityauthenticity. IntroductionInthispaperwepresentaproposalforasemiautomaticschemeforthedetectionoffalsenews.Overrecentyears,theextensivegrowthinthenumberandtypesoffakenewshasledtothenecessityforbuildingandeffectivedetectionsystemforfakesnewsidentificationwiththecapabilityofhandlingthevolume,thevarietyandthevelocityassociatedwiththem.AugeyandAlcaráz(2019)inarecentinvestigationconcludethatmostofthefalsenewsismotivatedbyfinancialobjectives.Weare,therefore,facedwithanewchallengethat,asMcNair(2017)pointsout,doesnotrespondtoanisolatedculturalproblem,buttotheresultofthesocialtrendsofthe21stcentury.Globalization,theriseofrelativism,thecrisisofobjectivity,theconsumptionofdigitalnewsorthedeclineoftrustinjournalismaresomeofthefactorsthatthisauthoridentifiesasexplosivesoftheriseoffalsenews.Inrelationtotheonlinefakenewsaudience,thelatestworksindicatethatitisasubsetofthetotal,disloyalandhighavailabilitynewsaudienceontheInternet(L.Nelson&Taneja,2018).TheWorldEconomicForum(WEF)hasbeenwarningforyearsabouttheglobaldangerofthemassiveexistingdigitaldisinformation,asatechnologicalandgeopoliticalrisk.Likewise,theEuropeanBarometer464on'Falsenewsanddisinformationonline',madein2018,detectedahighdegreeofbeliefinSpainofbeingexposedtofakenews.Itsdirectconsequencesinpoliticsarealsothesubjectofmanyotherstudies.Inthisline,anarticlepublishedintheresearchcenter'PewResearchCenter'(2016)claimedthatmostAmericans(64percent)suspectedthatthefalsenewsgeneratedconfusionandhadapotentialimpactonbothpoliticallifeasintheindividualcitizens.J,VargoandA.Amazeen(2017)warnoftherelationshipofthesenewswiththedigitalpartisanmedia,whichtheyidentifyashighlysensitive,howevertheydownplaytheirimpactonthenewmedia,althoughthesealsorespondtothefalsenewsagendas.

Page 10: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

7

Inthisworkweproposeadecisionsupportsystemthatfacilitatestheworkofagenciesthatspecializeinthedetectionoffalsenews.UsingamodelofmachinelearningbasedonNamedEntityRecognition(NER)andidentityauthenticity,itispossibletomakeamassivepreliminaryclassificationwithanaccuracyofmorethan80percent

MethodsCurrently,therearefactscheckingorganizationssuchasSnopes,Politifact,TruthOrFiction,Factcheck,OpenSources,FakeNewsWatch,fakespot,reviewmeta,Opensecrets.org,etcwhichoperateonthebasisofthetraditionaljournalisticmodel.Intheseorganizationsthereportershavetoevaluatefactsinordertoobtaintheveracityofastatement.Thisapproachisnotautomatedandisoftentime-consuminganddifficulttocompetewiththequantityoffakenewspublisheddaily.Thisproblemhasledresearchersandtechnicaldeveloperstolookatseveralautomatedwaysofassessingthetruthvalueofpotentiallydeceptivetextbasedonthepropertiesofthecontentandthepatternsofcomputer-mediatedcommunication.Machinelearning:SupervisedmachinelearningalgorithmslikeDecisionTree,RandomForest,SupportVectorMachine(SVM),LogisticRegression,K-nearestNeighbourareextensivelyusedinpreviousliteraturesforonlinehoaxes,frauds,anddeceptivein-formationclassification(Afrozetal.,2012);deeplearningbasedmethodsaregoodsolutionsforonlinefakenewsrepresentationanddetection,andhavebeenintroducedinRuchanskyetal.(2017).Unsupervisedlearningmodelforfakenewsdetection,theyare:clusteranalysis,outlieranalysis,semanticsimilarityanalysis(Li,McLean,Bandar,O’shea,&Crockett,2006),andunsupervisednewsembeddingtechniquesincludeWord2vec,FastText(Bojanowski,Grave,Joulin,&Mikolov,2017),Sent2vec(Pagliardini,Gupta,&Jaggi,2017),andDoc2vec(Le&Mikolov,2014).OurmethodisbasedontheNamedEntityRecognitiononnewsandidentityofauthors.UsingonlythisinformationinaSuperVectorMachineitispossibletoclassifyastoryasprobablyauthenticorprobablyfalse.Ourfocusisonthesimplicityoftheanalysisthatmakesitmoreappropriatewhenanalyzinglargeamountsofnews.Onlythosenewsthathavebeenclassifiedasfalseareanalyzedmanuallybytheorganizationstoensureandconfirmthisclassification. FindingsandArgumentOnasimulateddatasetof100falsenewsand100truthfulnews,theSVMsystemwasabletocorrectlydetectandclassify81.5%ofthenews.Asyoucanseeinfigure1theclassificationisquiteaccurate.

Figure1–NewsclassificationbySVMbasedonNER.1indicatestrueand-1false

ConclusionsAnewhybridmethodtodetectfalsenewshasbeenpresentedinthisarticle.Themaincontributionisthesimplicitythatmakesitsuitableforanalyzinglargeamountsofdata.Itisadecisionsupportsystemforcompaniesspecializedinthedetectionoffalsenewssinceitfacilitatestheirworkandgreatlyreducesthespeedandcosts.

Page 11: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

8

ReferencesAlcaraz,M.&Augey,D.(2019)WillFakeNewsKillInformation?Inbook:DigitalInformationEcosystems.JournalofCommunication,pp139-159.DOI:10.1002/9781119579717.ch7Afroz,S.,Brennan,M.,&Greenstadt,R.(2012).Detectinghoaxes,frauds,anddeceptioninwritingstyleonline.Securityandprivacy(sp),2012IEEEsymposiumon.IEEE461–475.Bojanowski,P.,Grave,E.,Joulin,A.,&Mikolov,T.(2017).Enrichingwordvectorswithsubwordinformation.TransactionsoftheAssociationforComputationalLinguistics,5,135–146.Barthel,M.,Mitchell,A.&Holcomb,J.(2016).Manyamericansbelievefakenewsissowingconfusion.PewResearchCenter.Retrievedfrom:https://www.journalism.org/2016/12/15/many-americans-believe-fake-news-is-sowing-confusion/EuropeanCommission(2018).Eurobarometer464on“FakeNewsandDisinformationonline”.Retrievedfrom:http://ec.europa.eu/commfrontoffice/publicopinion/index.cfm/survey/getsurveydetail/instruments/flash/surveyky/2183Gualda,E.(2019)Teoríasdelaconspiración,confianzaycredibilidadenlainformación.Communication&SocietyVOL.32(1).Retrievedfrom:https://www.unav.es/fcom/communication-society/es/articulo.php?art_id=728Hernández,A.(2017).Resilienciadelaorganizacióndelainformaciónenlaeradelasposverdad.Alcance,6(14),47-59.High-LevelExpertGrouponFakenewsandDisinformation(2018).Amulti-dimensionalapproachtodisinformation.ReportoftheindependentHighlevelGrouponfakenewsandonlinedisinformation.March.EuropeanCommission,Directorate-GeneralforCommunicationNetworks,ContentandTechnology,Luxemburg.Retrievedfrom:https://publications.europa.eu/en/publication-detail/-/publication/6ef4df8b-4cea-11e8-be1d-01aa75ed71a1/language-enLe,Q.,&Mikolov,T.(2014).Distributedrepresentationsofsentencesanddocuments.Internationalconferenceonmachinelearning1188–1196Li,Y.,McLean,D.,Bandar,Z.A.,O’shea,J.D.,&Crockett,K.(2006).Sentencesimilaritybasedonsemanticnetsandcorpusstatistics.IEEETransactionsonKnowledgeandDataEngineering,18(8),1138–1150McNairB.(2017)FakeNews.Falsehood,FabricationandFantasyinJournalism.RoutledgeFocus.1stedition.Pagliardini,M.,Gupta,P.,&Jaggi,M.(2017).Unsupervisedlearningofsentenceembeddingsusingcompositionaln-gramfeatures.arXiv:1703.02507Ruchansky,N.,Seo,S.,&Liu,Y.(2017).Csi:Ahybriddeepmodelforfakenewsdetection.Proceedingsofthe2017ACMonconferenceoninformationandknowledgemanagement.ACM797–806.Vargo,J.,Guo,L.&Amazeen,A.(2017)Theagenda-settingpoweroffakenews:Abigdataanalysisoftheonlinelandscapefrom2014to2016.NewMedia&Society,vol.20,5:pp.2028-2049.Weir,W.(2009).History´sGreatestLies.Beverly:FairWindsPress.

Page 12: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

9

RODA:atoolforsemi-automaticdata-drivenvisualstoriesXaquínVeira-GonzálezUniversitatdeGirona,[email protected]

AntonBarderaUniversitatdeGirona,[email protected]

[email protected]

MaríaLuisaOteroLó[email protected]

Abstract:RODA,ourrobotdataassistant,isaninteractivetoolfordatainquiryandautomationofdata-drivenvisualnarratives.Sofar,mostoftheresearchondata-drivenautomationinjournalismhasdrilleddownoneithertextnarrativesorisolateddatavisualizations.OurresearchforRODAisthefirsttimethattheaimistoproducenarrativesincorporatingsemanticallyinterwoventextandvisualizations.Byusingthecurrentadvancesinnatural-languagegenerationandresearchonstructuresofthistypeofstorytellingdevices,weexpandthetheoreticalframeworkofdata-drivenvisualstorytelling.Keywords:data-drivenstorytelling,informationvisualization,natural-languagegeneration,narrativevisualization,narrativeautomation,data-drivenvisualstories,robotjournalism,datajournalism,visualstorytelling

IntroductionLeadingnewsroomsaredevotingmoreresourcestomakevisual-drivenformatsintegraltotheirvocabulary.TheNewYorkTimes’2020report“JournalismThatStandsApart”chartedthegrowthof“storieswithdeliberatelyplacedvisualelements”fromclosetonothingin2014to12.1%bySeptember2016.“Deliberatelyplaced”isthecrucialnuancethathintstoaholisticwayofediting,asCairo(2017)describesit,thatiscurrentlybeingusedinmanyofleadingnewsrooms:text,visualizations,pictures,andvideosarejustdifferentmediatypestotellaportionofastory,whichflowsinandoutofthemseamlessly.Textandgraphicsblendinthenarrative;theyuseshortsentencesanchoredinsummarystatisticsthatrefertowhatthegraphicsshow.Manytoolsautomatetheproductionofchartsfromdata,andthecurrentadvancesinNLGarestartingtoprovidetoolstoautomatethewritingofstoriesfromdata.Wepresentarobotdataassistant(RODA)withtheambitiontoautomatetheproductionofcompletenarrativesthatcoherentlymixtextandvisualizationsbasedontheinputdata.Theuserwouldenteradatasetintothesystem,andfollowingaconversationwiththeapplicationtherobotwouldtrytounderstandthedata,questiontheuseraboutnecessarychecks,summarizepossiblepatternsortrends,recommendvisualizationtypes,andusenatural-languagegeneration(NLG)toassembleanarrativewithtextandvisualsthatwouldfittheprioritiesofthecommunicator.

MethodsThefocusofinformationvisualization(InfoVis)hadbeen,untilrecently,oninteractivevisualrepresentationsofdataasisolatedinterfacesforthedata.SegelandHeer’s(2010)shiftedthefocusandintroducedtheconceptofnarrativevisualizations,asacombinationofvisuals,multimediaandtextualelementsintegratedwithindata-drivenstorytellingsystems.InRicheetal.(2018),practitionersandresearchersexplorestorytellingtechniques,thelifecycleofthestory,andnarrativepatterns,definingfuturelinesofresearchandexplorationfordata-drivenvisualstorytelling.FollowingthepatternsinVeira-GonzálezandPerez-Montoro's(2018)weexploretheunderlyingstructuresofthesedata-drivenvisualstoriesandtheiratomiccomponents.

Page 13: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

10

FindingsandArgumentRODAisdesignedasablendofchatbotandon-screeninteractionwiththeuserfromwhichitcalculatesthesummarystatistics,recommendscharts,understandswhatthedatameans,gatherstheuser’sprioritiesforthestory,andoutputsastorystructurecomposedofnarrativeblockswithsemanticallyinterconnectedtextandvisuals.

Figure1-StreamlinedflowchartofhowRODA’sinputsandfeedbacktotheuserwork Eachstoryatomiscomposedofadatadescriptiontext,avisualization,andanexplanationandtransitionaltextalltiedtothecurrentviewofthedata.TheyresembleKosara’s(2017)Claim,Fact,andConclusion(CFO)patternbasedonCohn’s(2013)narrativestructureforcomics.RODA’srecommendationsrestonresearchthattwooftheauthorsdidwhiledesigningTheGuardian’sin-housechartingtool.Whiletheimplementationstartswithadatasetandthensuggestsavisualdisplay,ourapproachwalkedbackwardfromadozenofvisualizations:thesystemparsesthedatatypesoftheinput,therangesofthenumericproperties,andfilterstheavailablevisualizationmethodsbasedonasetofconstraints.Aroundeachinstanceofthevisualization,thetextreferstothecriticalfeaturesvisualized,andifitisknown,providesthecontextandthereasonsforwhatitissalient.Textblocksinthesetypeofstories,especiallytheexplanatorycopy,serveasimilarfunctiontotheannotationlayerwithindatavisualizations.Inordertoautomaticallydetectfactsfromthedataforcontentplanningofthetransitionaltext,severalstatisticshavetobecomputed.Inthefirstiterationoftheprototype,wefocusonbasicstatisticalmeasures,suchasmean,median,percentilesandquantiles,andstandarddeviation.Thetoolusesrule-basedgenerationtowritethetext,amoresophisticatedmethodthanfill-in-the-blanks-with-datatemplates.

Page 14: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

11

Theuser’sanswersdeterminethecontentforexplanatorytextsinthefinalpartoftheapplicationflow.Oncethetoolhascomputedasetoffactsandhasunderstoodhowtorefertothedata,itcanasktoaddcontextorreasonsbehindsomeofthosecalculatedfacts,suchas“dotheoutliershaveanythingincommon,”“howaboutitemsneartheaverageorthemedian.”Theapplicationwouldthensummarizethoseanswers.Inordertostructurethosenarrativeblocksandcomposethestory,theapproachthatbestfitsourpurposesisKosara’s(2017)Claim,Fact,andConclusion(CFO)pattern,basedonCohn’s(2013)narrativestructureforcomics.WealsouseVeira-GonzálezandPerez-Montoro's(2018)storypatternstodeterminetheorderandscope(overviewordetails)ofthenarrativeblocks.

ConclusionsWhatRODAcandoinitscurrentiterationisinherenttoitspurpose.Itisn’tjustatoolforautomatingdata-drivenstories,butalmostmoreimportantlyatoolfortrainingjournalists:anaidtocontributetodataandvisualliteracyinnewsroomsandothercommunicationenvironments.Someofthelimitationsalsocomefromtherelativelysmallbodyofresearchonthedata-drivenvisualstories—stillinitsinfancycomparedtoothersubfieldsofInfoVis—andthefactthatthispaperisafirst-everapproachtoautomatingthesetypeofnarratives.Futureresearchwillgainaswellfromsurveyingtheeffectivenessofthesemachine-writtenvisualstoriescomparedbothtohuman-writtentextpiecesandtoindividualchartsbythemselves.

ReferencesCairo,A.,2017.Nerdjournalism:Howdataanddigitaltechnologytransformednewsgraphics(Doctoraldissertation,UniversitatObertadeCatalunya).Cohn,N.,2013.Visualnarrativestructure.Cognitivescience,37(3),pp.413-452.Kosara,R.,2017,June.Anargumentstructurefordatastories.InProceedingsoftheEurographics/IEEEVGTCConferenceonVisualization:ShortPapers(pp.31-35).EurographicsAssociation.Pérez-Montoro,M.andVeira-González,X.,2018.InformationVisualizationinDigitalNewsMedia.InInteractioninDigitalNewsMedia(pp.33-53).PalgraveMacmillan,Cham.Riche,N.H.,Hurter,C.,Diakopoulos,N.andCarpendale,S.eds.,2018.Data-drivenstorytelling.CRCPress.

Page 15: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

12

Becomingadatajournalist:theroleof

identityinjournalismeducation

LizHannafordManchesterMetropolitanUniversityL.Hannaford@mmu.ac.uk

Abstract:Asdatajournalismbecomesmainstream,journalismeducatorsneedtofindwaystobringitintotheirteaching.However,theliteratureshowsthatthiscanbeproblematicforstaffandstudentsinanalreadytightly-packedcurriculum.Theobjectiveofthisstudyistoexplorewaysinwhichlearningtododatajournalismcanbereconceptualisedasasocialprocessofbecomingadatajournalistwherebystudentsareinvitedtotakeonthebeliefsandvaluesofthenewprofessionalidentitiesmadedesirableinadatifiedsociety.Iaddressthisproblembyusingadiscourseanalysisapproachtoexplorehowvocaladvocatesinthisfieldusedifferentdiscursivestrategiestojustifytheirpractices.Preliminaryfindingssuggestspeakersredrawtheboundariesofjournalismastheynegotiateandlayclaimtocompetingidentitieswithimplicationsforjournalismeducators.Keywords:Datajournalismeducation,identity,CommunityofPractice,discourseanalysis

IntroductionThispapersetsoutthebackground,rationale,methodsandsomeinitialfindingsofmydoctoralresearch,currentlyongoing,whichisastudyofthediscoursesofdatajournalism,theconstructionofdatajournalistidentitiesandtheimplicationsforjournalismeducation.Encounteringspreadsheets,statisticsandcodecanbeajarringexperienceforundergraduateswhodidnotexpecttheirjournalismcoursetobe‘technical.’Educatorsneedtoaddresswhatprofessionalandsocialidentitiesweareinvitingstudentstoinvestinwhentheystudyjournalismandhowtheseidentitieshavebeenrecalibratedbyjournalism’s‘quantitativeturn.’Toachievetheseaims,myresearchisanalysingthetaken-for-grantednormsthathavebecomeembeddedinthediscourseofdatajournalism,andaskingwhetherandhowthisdiscoursecanbeexclusionary.ThereisnowasmallbutgrowingnumberofspecialistdatajournalismcoursesatMasterslevelintheUK(Bradshaw,2018),butattemptstointroducedatajournalismskillsintotraditionaljournalismprogrammeshaveencounteredobstacles.Theseincludejournalismstudents’aversiontomaths,afearthatstudentsareputoffbythesubjectandthelackofqualifiedstafftoteachdatajournalism(Hewett,2015).Theliteratureondatajournalismeducationhaspredominantlyfocusedontheextenttowhichitistaughtaroundtheworldandthechallengesitpresents(Splendoreetal.,2016insixEuropeancountries;BerretandPhillips,2016intheUnitedStates;YangandDu,2016inHongKong;DaviesandCullen,2016inAustralia;Heravi,2019globally).Elsewhere,researchhasexploredtheimportanceinthisfieldofpeer-to-peerlearningthroughinformalnetworkssuchastheNICARlistserv(Howard,2014;FinkandAnderson,2015;HermidaandYoung,2017),HacksHackersmeet-upsaroundtheworld(LewisandUsher,2014)anddedicatedsocialmediagroups(Appelgren,2016),allofwhichsuggesttheimportanceofsocialidentityandcommunityparticipationinthisfield.

Page 16: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

13

Theimplicationsoftheselearningpracticesforformaljournalismeducationhaveyettobefullyexploredintheliterature.Giventheneedforjournalismeducationtoembracebasicdataskillsasarequirementoftheprofession(StalphandBorges-Rey,2018),itisarguedherethatsocio-culturalperspectivescouldprovidevaluableinsightsintolearningasanongoingsocialprocess.Researchfromotherprofessionsprovidessupportforthisapproach(Monrouxe,(2010)inmedicalstudentsandBeauchampandThomas,(2009)instudentteachers,forexample).Thispaperpresentsinitialfindingsfromanalysisofthediscoursesofdatajournalismasdeployedbyitsinfluentialearlypioneersduringthesocialinteractionofinterviewsandpaneldiscussions.Theanalysissuggeststhatthedominantdiscoursesofdatajournalismoftenrelyonnegativerepresentationsoftraditionaljournalismas‘broken’whilstrepresentingtechnologistsasheroicsaviours.Thesediscursivestrategiestalkdatajournalism‘out’ofthejournalismcurriculumandcanantagonisestudents’socialidentity.Iarguethatitwouldbemorebeneficialtofindwaysoftalkingit‘in’tothecurriculumtohelpstudentsmanagethetransitiontothenewprofessionalidentitiesrequiredinadatifiedsociety.

MethodsThemethodologyisdrivenbythefollowingresearchquestions.Howdovocaladvocatesofdatajournalismtalkaboutthisfield?Whatidentities(subjectpositionings)andpracticesdothesewaysoftalkingmakepossible?Whataretheimplicationsforjournalismeducators?Theresearchiscurrentlyongoing.Thepreliminaryanalysispresentedinthispaperisbasedonusingadiscourseanalysisapproach(JagerandMaier,2016),whichbuildsonrecentinterestininvestigatingdatajournalismasasocio-discursiveconstructproducedthroughsocialinteraction(Powers,2012;DeMaeyeretal.,2015;Borges-Rey,2017)asopposedtoaninevitablerealityexisting‘outthere’.ThedatatowhichthisanalysishasbeenappliedconsistsoftranscriptsofinterviewsandpaneldiscussionsinvolvingprominentpractitionersandpioneersinNorthAmericanandEuropeandatajournalismfrom2008to2018.Toproduceamanageabledatasetforadiscourseanalysisapproach,apurposiveselectionofinteractionaltextswasmade.Theadvantageofanalysingsocialinteractiontextsasopposedtowrittentextsisthattheyarearichsourceofnarrativesabouthowthespeakersseethemselvesandwhattheydo.

FindingsandArgumentTheresearchisongoing,andassuchthefindingspresentedarepreliminary.Anumberofdifferentdiscursivestrategieswereusedbyvocaladvocatesofdatajournalismtolegitimisetheirpractice.Thereisevidenceoftheuseofnegativerepresentationsoftraditionaljournalismas‘broken’,theidealisedpursuitofjournalistic‘truth’,theunquestionedprivilegingofnumerical,structureddataoverotherformsofknowing,aradicalvisionofthefutureofjournalismbutalsopositiverepresentationsoftechnologyasemotionallyfulfillingandadiscourseofoptimismaboutjournalism’sfuture.Thesediscoursesinvolvethespeakersnegotiatingandlayingclaimtodifferentandcompetingidentitiesastheyredrawtheboundariesofjournalismandexplorenewwaysofbecomingajournalist.Thesenegotiationstakeplaceagainstthebackgroundofanexistentialcrisisinjournalism.Neo-liberaldiscoursesthusblendwiththesenewidentitiesasjournalismbecomesadigitisedcommodityinaglobalmarket.Practicesrequiredbythesewaysoftalkingaboutdatajournalismincludethefetishisationofopenness,collaboration,disruptionandinnovation.Continuallearning–oftenfrompeers–ishighlyvaluedandalignedwithsocialparticipationincommunitiesoflikemindedpractitionersthattranscendorganisationalboundaries.

ConclusionsThepreliminaryconclusionsoftheresearchsuggestthatjournalism’squantitativeturnrequiresmorefromeducatorsthanjustsqueezingnewskillsintoanalreadytightly-packedcurriculum.Iarguethatknowledgeandidentityareintertwined(LaveandWenger,1991)andsoeducatorsneedtoconsiderwhostudentsneedtobeasmuchaswhattheyneedtoknow.Theexperienceofbecomingajournalistrepeatedlyraisesissuesofidentity,valuesandbeliefsthatneedtobeaddressedintheclassroom.Studentshavetobeabletomakesenseof

Page 17: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

14

themselvesinthisnewjournalism-technologyenvironmentand,iftheychoose,inhabittheprofessionalidentitiesthatresult.

ReferencesAppelgren,E.(2016)'DataJournalistsUsingFacebook.'NordicomReview,37(1)pp.156-169.Beauchamp,C.andThomas,L.(2009)'Understandingteacheridentity:anoverviewofissuesintheliteratureandimplicationsforteachereducation.'CambridgeJournalofEducation,39(2)pp.175-189.Berret,C.andPhillips,C.(2016)TeachingDataandComputationalJournalism.NewYork,NY:ColumbiaJournalismSchool.Borges-Rey,E.(2017)'TowardsanepistemologyofdatajournalisminthedevolvednationsoftheUnitedKingdom:Changesandcontinuitiesinmateriality,performativityandreflexivity.'Journalism(publishedonlineaheadofprint1stFebruary)AvailableatDOI:10.1177/1464884917693864Bradshaw,P.(2018)'DataJournalismTeaching,FastandSlow.'AsiaPacificMediaEducator,28(1)pp.55-66.Davies,K.andCullen,T.(2016)'DataJournalismClassesinAustralianUniversities:EducatorsDescribeProgresstoDate.'AsiaPacificMediaEducator,26(2)pp.132-147.DeMaeyer,J.,Libert,M.,Domingo,D.,Heinderyckx,F.andLeCam,F.(2015)'WaitingforDataJournalism:Aqualitativeassessmentoftheanecdotaltake-upofdatajournalisminFrench-speakingBelgium.'DigitalJournalism,3(3)pp.432-446.Fink,K.andAnderson,C.(2015)'DataJournalismintheUnitedStates:Beyondthe“usualsuspects”.'JournalismStudies,16(4)pp.467-481.Jäger,S.andMaier,F.,2016.Analysingdiscoursesanddispositives:aFoucauldianapproachtotheoryandmethodology.InWodak,R.andMeyer,M.(eds)3rded.Methodsofcriticaldiscoursestudies,LosAngeles:Sage,pp.109-136.Heravi,B.R.,2019.3WSofDataJournalismEducation:What,whereandwho?.JournalismPractice,13(3),pp.349-366.Hermida,A.andYoung,M.L.(2017)'FindingtheDataUnicorn:Ahierarchyofhybridityindataandcomputationaljournalism.'DigitalJournalism,5(2)pp.159-176.Hewett,J.(2015)'Learningtoteachdatajournalism:Innovation,influenceandconstraints.'Journalism,17(1)pp.119-137.Howard,A.B.(2014)TheArtandScienceofData-drivenJournalism.NewYork:TowCenterforDigitalJournalism.[Online][Accessedon21stMarch2019]Availableathttps://doi.org/10.7916/D8Q531V1Lave,J.andWenger,E.(1991)Situatedlearning:Legitimateperipheralparticipation.CambridgeUniversityPress.Lewis,S.C.andUsher,N.(2014)'Code,Collaboration,AndTheFutureOfJournalism:AcasestudyoftheHacks/Hackersglobalnetwork.'DigitalJournalism,2(3)pp.383-393.Monrouxe,L.V.(2010)'Identity,identificationandmedicaleducation:whyshouldwecare?'Medicaleducation,44(1)pp.40-49.Powers,M.(2012)'“InFormsThatAreFamiliarandYet-to-BeInvented”AmericanJournalismandtheDiscourseofTechnologicallySpecificWork.'JournalofCommunicationInquiry,36(1)pp.24-43.Splendore,S.,DiSalvo,P.,Eberwein,T.,Groenhart,H.,Kus,M.andPorlezza,C.(2016)'Educationalstrategiesindatajournalism:AcomparativestudyofsixEuropeancountries.'Journalism,17(1)pp.138-152.Stalph,F.andBorges-Rey,E.(2018)'DataJournalismSustainability:Anoutlookonthefutureofdata-drivenreporting.'DigitalJournalism,6(8)pp.1078-1089.Yang,F.andDu,Y.R.(2016)'StorytellingintheAgeofBigData:HongKongStudents’ReadinessandAttitudetowardsDataJournalism.'AsiaPacificMediaEducator,26(2)pp.148-162.

Page 18: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

15

PredictivesentimentanalysisofmessagesforJournalisticPurposes:Real-timeclassificationoftweetsbasedonMachineLearning

Prof.Dr.FélixOrtega Dr.CarlosArcila Prof.AntonioGarcía

[email protected]

[email protected]

[email protected]

Abstract:Algorithms,bigdata,machinelearningandartificialintelligencesystemsarekeyconceptsandmethodsforthereshapingofoursociocultural,economicandpoliticalrelationsinoureverydaylife.Digitalcultureandcommunicationareinevitablychangingasmediainfrastructures,mediapracticesandsocialenvironmentsbecomeincreasinglymoredataconscious-driven.Theeffortsinbringingtogetherautomatedsentimentanalysisbasedonmachinelearningandstreamingtechnologiesthatproduceimportantamountofdata,arerelativelynewtojournalisticorientedmediaenterprises.Thispaperdescribesandassessthecreationofmachinelearningmodelstopredictsentimentsinreal-timetweetsassociatedtotheunderrevolutionJournalisticValueChain(JVC)anddepictshowthisprocesscanbescaledusingcommercialdistributedcomputingwhenpersonalcomputersdonotsupportcomputationsandstorageinordertoprovidetheDataJournalisticUnit(DJU)withtoolsforabetterjobandinformationperformance.Keywords:Predictivesentimentanalysis;PoliticalOpinion;Twitter;MachineLearning;BigData;Politicaltweets;Bigdatadigitaljournalism.

IntroductionAlgorithms,bigdata,machinelearningandartificialintelligencesystemsarekeyconceptsandmethodsforthereshapingofoursociocultural,economicandpoliticalrelationsinoureverydaylife.Digitalcultureandcommunicationareinevitablychangingasmediainfrastructures,mediapracticesandsocialenvironmentsbecomeincreasinglymoredataconscious-driven.Theconsumer´suseofthecommonplacemediatechnologiesinaworldofinformationandnewsismediatedbydata,interpelattingandadaptingtoconsumerpreferencesinaneverydaymoreautomatedway.Weliveinaworldwhichisincreasinglyinfluencedbyalgorithmsandartificialintelligencemethodsandprocesses.Thistrendisnowspreadingrapidlytotheanalysisandcomprehensionoftheflowofinformation,opinionsandnewsinthedigitalmedia.Algorithms,machinelearningandartificialintelligenceisbeingimplementedinthefilteringofalargepercentageofthecontentpublishedonsocialmediaplatformsandtheirApps,pickingoutwhatispotentiallynewsworthy(Thurmanetal.,2016,2017)fortheconsumergivenitspreferencesandtransformingnews`managementandagendasettinginmediaenterprisesintoinamoreDataBrokerManagementJournalism(DBMJ)wherenewjournalisticcompetencesconvergewithonlineandalmostrealtimeanalysisofthe“trends”,“visits”,“ratios,…andbrokermarketingorientedvisualizations.Inthiscontext,thereisagrowinginterestinsurveyingopinionsusinglarge-scaledataproducedbysocialmedia(Cobb,2015;O'Connor,2010;Bollen,Mao&Pepe,2011)inmediaenterprisesandinparticularthetraditionaljournalisticallyorientedbusiness.Thevastmajorityoftheseresearchisbaseduponeitheronmanualclassificationorautomatedcontentanalysisusingdictionariesthatscorewords(e.g.givinganapriorinegativeorpositivevaluetoeachword)(Leetaru,2012;Feldman,2013)andotherapproachessuchassupervisedmachinelearning(Vinodhini&Chandrasekaran,2012)arescarceincommunicationresearch(vanZoonen&Toni2016).Moreover,theeffortsinbringingtogetherautomatedsentimentanalysisbasedonmachinelearningandstreamingtechnologiesthatproduceimportantamountofdata,arerelativelynewtojournalisticorientedmediaenterprises.Thispaperdescribesandassessthecreationofmachinelearningmodelstopredictsentimentsinreal-timetweetsassociatedtotheunderrevolutionJournalisticValueChain(JVC)anddepictshowthisprocesscanbescaledusingcommercial

Page 19: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

16

distributedcomputingwhenpersonalcomputersdonotsupportcomputationsandstorageinordertoprovidetheDataJournalisticUnit(DJU)withtoolsforabetterjobandinformationperformance.Localcomputingsolutionsforjournalisticpurposesmayprovideseriouslimitationsintherangeofhighamountofdataanalysiswhichnecessarilyrequiresscalablestorageanddistributedcomputing.Runningstreamingdataanalysisindistributedplatformshasbeenchallenginginthecomplexandchangingbigdatalandscape(Turck&Hao,2016).TheincorporationoftoolssuchasApacheKafkahasallowedthecurrentmostextendedopensoftwarefordistributedcomputingApacheSparktofulfillthisgapwithSparkStreaming(SparkKafkaIntegration,2016),whichcanreadcodeinScalaoralsoinPython(withthemodulePySpark).WeanalyzeinourresearchhowjournalistscanextendsentimentanalysiswithApacheSparkStreaminginlocalmachinesusingtrainedmodelswithSparkMachineLearning.Wealsoexplainhowthisprocedureisscalableusingcommercialtools(insteadofacademicgrids)suchasthemostpopularInfrastructureasaService(IaaS)AmazonWebServices(AWS),thatoffersAmazonS3formassivestorageandAmazonElasticComputingCloud(EC2)tocreateaflexiblesetofconnectedinstancesinthecloudinordertocomputetheanalysis.

MethodsThecomputationalmethodsandservicesexplainedinourresearchmaycontributetohelpjournalistsinmediaenterprisesstudy,interpretandanalyzebigamountsoftweetsinanylanguagerunningsentimentanalysisinreal-time.Thesemethodsandtechniquesdorequiresomeprogrammingskills,howeverexitingmodelsallowsshort-timelearningcurvesforthefinaluserprovidingeasyadaptations.WeprovidejournalistswithallthecodeforSpark(writteninPythonandusingPySpark)inaiNoteBook(ipynb).Nomathematicalbackgroundisneededtorunthemachinelearningmodels,butatheoreticalunderstandingofthealgorithmswillincreasethequalityoftheinterpretationforthetrainedjournalist.Inthecaseofthementionedcommercialservices(AWS,Azure,IBM,etc.)mediaenterprisesmustconsiderthefinancialcostsassociatedforthisanalysis.Inaddition,workingwithinterdisciplinaryteams(computerscientists,statisticians,computationallinguistics,etc.)canimprovetheresultsandsaveresourcesforthedesignoftheDatajournalisticenterprise.Thedescribedproceduretomonitortweetsinstreamingmighthelptestingtraditionalandemergingtheoreticalapproachesincommunicationresearchthatrequirelongitudinaldataandmightalsocontributetoexperimentalstudieswhichneedreal-timeinputstocreateoradapttostimuli,understandingthemediajournalisticvaluechaininareciprocalwayiskeytotheconsolidationoftheprofessionalprofilesoftheDataAnalysisEra.

FindingsandArgumentInasectorassociatedtoinnovationandtechnology,mediaenterprisesareadaptingandtransformingtheirworkflowstructuresintoamoreautomatedjournalisticvaluechain,whereartificialintelligencesoftwareissubstitutingtraditionaljournalistic“roles”intoascenariowherelittleornohumaninterventionisrequiredasidefromthesoftware-hardwareimplementationandprogramming(Carlson,2015)insomespecificnichenewsproductionandredistributionlikesocialnetworks.IfImayuseametaphortoillustrate,roboticsongoingimplementationisrevolutionizingtheautomobileindustryatacontinuouspace,likewisethedevelopmentofbigdatabroadly,artificialintelligenceprocessesandmachine-learningandwrittennewsisopeninganewscenariowheretechnologyprovidersimplementalgorithmsandartificialintelligenceprocessestodeliverautomatednewsinmultiplelanguageswithethicalchallengesarising(Dörretal,2016,2017).Thisisthetruerevolutionforjournalistic-mediaenterprises.Algorithmsarebeingusedinnewwaystodistributeandpackagenewscontent,bothenablingconsumerstorequestmoreofwhattheylikeandlessofwhattheydon’tandalsomakingdecisionsonconsumers’behalfbasedontheirdatapreferencecurveandprofile(GrootKormelinkandCosteraMeijer,2014).WeprovidejournalistsasindicatedabovewithallthecodeforSpark(writteninPythonandusingPySpark)inaiNoteBook(ipynb).Nomathematicalbackgroundisneededtorunthemachinelearningmodels,butatheoreticalunderstandingofthealgorithmswillincreasethequalityoftheinterpretationforthetrainedjournalist.

Page 20: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

17

ConclusionsThesocialroleofjournalismwillprevailasalongstandingfacilitatorandinterpreterofwhatisgoingon,butthelabourprocessofthejournalisticrolesandauthoritieswillmergeintoanArtificialIntelligenceroleandahumanbasedvalueaddedprovider.Thequalityofthenews,theirinterpretation,thefinalaccountabilitywillbeplacedprogressivelyinthehandsofartificialneuralnetsandhumanneuronssimultaneously.Asimplementedintheautomobileindustry,thehumanneuronswillsuperviseandprogrammethesoftware-hardware-roboticprocessesworkflowandcontributewherecomplexreasoningandreprogramming,orcomplexwritingisatneedandsocio-economicandpoliticallyviable.Thereshouldnotbeacatastrophicconcernaboutthequalityofthenews,theirtransparencyandaccountabilitywiththedatarevolutioninplace.Itwillremainalmostasitistodayandimprovingwithmoredataavailableandanalysis.Itwillimplylesshumaninterventionwhere“machinelearningprocesses”willprevailgiventheircomparativeadvantagestothatofthehumanworkforce.ThejournalisticworkerwillevolveintoaDataBrokerManagementJournalist(DBMJ)ridinghis“datadrivensurfboard”inanalldigitalrevolutionisedJournalisticValueChain(JVC),followingdataandnewslabelledviablockchainsforbettertraceability.ThisrenewedjournalistwilladjustprocessesandtakedecisionsbysupervisingthejobdonebytheArtificialIntelligenceBigDataNewssolutions,filteringandcontrasting“fakeorirrelevant”newsandinsomemorevalue-addedandquality-orientatedmediabusinesswilltosomeextentcomplementtheirproductsandservicesprovidinghumananalysisandinterpretationwheresuitablefortheconsumerandprofitableforthejournalisticbusiness.ThetraditionalhumanbrandbasedjournalismwhichtheNYT,theWashingtonPost,ElPaís,LaNación,…amongothersrepresent,aremergingatasteadypacewiththeDBMJandanalldigitalanddatabasedjournalisticvaluechain.Theoldandneweditorialfunctionsareprogressivelybeingallocatedtonewworkingprofilesandrolessituatedatrenewedandspecificvaluechainlociwheretheintersectionofthehumanandthemachineisbeingsubstitutedbyprimarilyartificialintelligenceprocessesgivenitscompetitiveadvantages.Theobligationsofautomation,artificialintelligence,machinelearningjournalismshouldkeepinplacethenorms,ethicsandvaluestranscribedintothenew“alldigital”software-hardwarejournalism.Therelationshipwiththeaudienceeitherperformedbyanallautomatic“roboticnewsprovider”withthesupervisionandmediationofhumanintelligenceshouldhavetheobligationofpreservingconsumerrightsandeffectiveimplementationofprivacyanddatamanagementpolicies.Weprovideinourresearchwiththeprototypeofoneofthelatestpredictiveinstrumentsandmethodsforresearchonalgorithms,machinelearning,automation,andnews,the“distributedjournalisticorientedsentimentanalysis”toolforrealtimetweetsDJOSA-tool,oneofmanytoolsfortheDBMJ.Anewparadigmforcontemporaryempiricalresearch,andrigorousconceptualdevelopmentondigitaljournalismanddataanalysishastobeimplementedinthecomingyearsinthenewdatascenariofornewsproductionanddistribution.Researchorientatedtoempiricalanalysiswithrichconceptualdiscussingwillbepresentedinourcasestudy.Thepromisedlandforjournalismswhichalgorithmsandautomationisbuildingwillprovidepersonalisationofcontent,fasternewsprovisionassociatedtopreferencecurves.Appsareboundtobetheutensilinengagingusersinamixedpublicityfundedanddirectpaybusinessmodel.Socialnetworksandcontentproviderswillnecessarilymergeandsearchforqualitycontentinordertoretainconsumerswithintheirpersonalisedenvironments.Inthisarticle-researchwedescribeandevaluatetheapplicationofPredictiveSentimentAnalysis-PSA-,toapoliticalcommunicationcasestudythroughareal-timeclassifierofpoliticalopinionsinSpanishtweetsusingmachinelearningmethodsandtechniquesbothonalocalcomputerandusingdistributedcomputingforBigDataproblems.Wepresentthepilotapplicationandthefirstresultsofthedesigneddataexperimentandprototype.Wedescribetheassociatedemergingmethodologiesandtechniquesandanalyzethethreatsandopportunitiesthattheseinnovationsrepresentforpoliticalcommunicationandothercommunicationalresearchareasofinterest.Thisprototypefreelyaccessible-opensourcedenablingthecommunicationresearchertoautonomouslyinterpretthedatagivenminimalpriortrainingonthetechniquesandmethods.Itprovidesascientificinstrumentdesignedfortheunderstandingofmediaflowsandpoliticalthoughandopinion.

Page 21: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

18

Thedataparadigmhasarrivedasanunquestionablesourceofinformationconceptforthestudiesofdigitalcultureanddigitalmedia,communicationandtechnology.Algorithmsanddataaretodayfundamentalinordertoeffectivelycontrastformerunexplorablehypothesis.Itmaybedisruptiveatthebeginningbutitiscertainlyachangeofparadigm,fromobscuritytopotentiallylargequantitiesofdataanalysisandmachinelearningmethodologies.Thefocusdoesnotchange,itisunderstandingcommunicationalprocesses,buttheinstrumentsanddatadochangehowweunderstand,studyandresearchinourdisciplines.Theshiftoffocusonalgorithmsanddataispositivelydisruptiveforthewaysinwhichweseeourresearchanddisciplines.Itmayevenappeartolimitthetheoreticalandmethodologicaltoolsthroughwhichweincreasinglytrytounderstandmediation,theformationofidentity,sociallife,politicsandthecreativeindustries.ThereisaneedtoreformulatethetheoreticalandempiricalperspectivesandevenparadigmsonCommunicationResearchanditsrelationwithdataacquisition,curationandinterpretation.IfwearetoaugmentanddiversifyourperspectivesattheCommunicationResearchAcademia,algorithms,machinelearningandartificialintelligencearecertainlymustmethodsandinstrumentinbringinglighttothestillscientificallyunexploreddigitalcitizen-consumer.Thisresearchshowsamethodologicalandinstrumentalmethodstoaddresscommunicationaldigitalprocesses,fromacomplementaryapproachtoexistingscientificmethods.

ReferencesAnderson,C.W.(2017)Socialsurveyreportage:Context,narrative,andinformationvisualizationinearly20thcenturyAmericanjournalism.Journalism18:1,pages81-100.Ausserhofer,J.,RobertGutounig,MichaelOppermann,SarahMatiasek,EvaGoldgruber.(2017)Thedataficationofdatajournalismscholarship:Focalpoints,methods,andresearchpropositionsfortheinvestigationofdata-intensivenewswork.Journalism17.Bollen,J.,Mao,H.,&Pepe,A.(2011).Modelingpublicmoodandemotion:Twittersentimentandsocio-economicphenomena.ICWSM,11,450-453.Carlson,M.(2015).TheRoboticReporter.Automatedjournalismandtheredefinitionoflabor,compositionalforms,andjournalisticauthority.DigitalJournalism,3:3,pp.416-431.http://dx.doi.org/10.1080/21670811.2014.976412Cobb,W.N.W.(2015).Trendingnow:usingbigdatatoexaminepublicopinionofspacepolicy.SpacePolicy,32,11-16.Dörr,K.N.(2016)MappingthefieldofAlgorithmicJournalism.DigitalJournalism,4:6,pp.700-722.Dörr,K.N.,Hollnbuchner,K.(2017)EthicalChallengesofAlgorithmicJournalism,DigitalJournalism,5:4,404-419,DOI:10.1080/21670811.2016.1167612Davies,K.,TrevorCullen.(2016)DataJournalismClassesinAustralianUniversities:EducatorsDescribeProgresstoDate.AsiaPacificMediaEducator26:2,pages132-147.FaridaVis(2013)Twitterasareportingtoolforbreakingnews.DigitalJournalism,1:1,27-47,DOI:10.1080/21670811.2012.741316Feldman,R.(2013).Techniquesandapplicationsforsentimentanalysis.CommunicationsoftheACM,56(4),82-89.Go,A.,Bhayani,R.,&Huang,L.(2009).Twittersentimentclassificationusingdistantsupervision.CS224NProjectReport,Stanford,1,12.Groot-Kormelink,T.,Costera-Meijer,Irene(2014)Tailor-MadeNews,meetingthedemandsofnewsusersonmobileandsocialmedia.JournalofJournalismStudies,Volume15:5:Futureofjournalism:inanageofdigitalmediaandeconomicuncertainty.Pp.632-641.http://dx.doi.org/10.1080/1461670X.2014.894367Jung,J.,HaeyeopSong,YoungjuKim,HyunsukIm,SewookOh.(2017)Intrusionofsoftwarerobotsintojournalism:Thepublic'sandjournalists'perceptionsofnewswrittenbyalgorithmsandhumanjournalists.ComputersinHumanBehavior71,pages291-298.Kelleher,J.D.,MacNamee,B.,&D'Arcy,A.(2015).Fundamentalsofmachinelearningforpredictivedataanalytics:algorithms,workedexamples,andcasestudies.MITPress.Leetaru,K.(2012).Dataminingmethodsforthecontentanalyst:Anintroductiontothecomputationalanalysisofcontent.Routledge.MarkCoddington(2015)ClarifyingJournalism’sQuantitativeTurn,DigitalJournalism,3:3,331-

Page 22: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

19

348,DOI:10.1080/21670811.2014.976400Marchetti,R.,Ceccobelli,D.(2016)TwitterandTelevisioninaHybridMediaSystem.JournalismPractice10:5,pages626-644.MominM.Malik,JürgenPfeffer.(2016)AMacroscopicAnalysisofNewsContentinTwitter.DigitalJournalism4:8,pages955-979.NicholasDiakopoulos(2015)AlgorithmicAccountability,DigitalJournalism,3:3,398-415,DOI:10.1080/21670811.2014.976411O'Connor,B.,Balasubramanyan,R.,Routledge,B.R.,&Smith,N.A.(2010).Fromtweetstopolls:Linkingtextsentimenttopublicopiniontimeseries.ICWSM,11(122-129),1-2.SparkKafkaIntegration(2016).SparkStreaming+KafkaIntegrationGuide.Availableat:http://spark.apache.org/docs/latest/streaming-kafka-integration.htmlSethC.Lewis(2015)JournalismInAnEraOfBigData,DigitalJournalism,3:3,321-330,DOI:10.1080/21670811.2014.976399Shahin,S.(2016)WhenScaleMeetsDepth:IntegratingNaturalLanguageProcessingandTextualAnalysisforStudyingDigitalCorpora.CommunicationMethodsandMeasures10:1,pages28-50.Sormanen,N.,JukkaRohila,EppLauk,TuroUskali,JukkaJouhki,MaijaPenttinen.(2016)ChancesandChallengesofComputationalDataGatheringandAnalysis.DigitalJournalism4:1,pages55-74.Swart,J.,Peters,C.,Broersma,M.(2016)NavigatingCross-MediaNewsUse.JournalismStudies0:0,pages1-20.Swart,J.,ChrisPeters,MarcelBroersma.(2017)Repositioningnewsandpublicconnectionineverydaylife:auser-orientedperspectiveoninclusiveness,engagement,relevance,andconstructiveness.Media,Culture&Society,pages016344371667903.TuomoHiippala(2017)TheMultimodalityofDigitalLongformJournalism,DigitalJournalism,5:4,420-442,DOI:10.1080/21670811.2016.1169197Turck,M.&Hao,J.(2016).TheChartoftheBigDataLandscape2016(Version3.0).Availableat:http://mattturck.com/big-data-landscape-2016-v18-final/Thurman,N.,Dörr,K.N,Kunert,J.(2017)Whenreportersgethand-onwithrobo-writing:ProfessionalsCondierAutomatedJournalism`sCapabiitiesandConsequences,DigitalJounalism.http://dx.doi.org/10.1080/21670811.2017.1289819Thurman,N.J.,Schifferes,S.,Fletcher,R.,Newman,N.,Hunt,S.&Schapals,A.K.(2016).Givingcomputersanosefornews:exploringthelimitsofstorydetectionandverification.DigitalJournalism,4(7),pp.838-848.doi:10.1080/21670811.2016.1149436vanZoonen,W.,&Toni,G.L.A.(2016).Socialmediaresearch:Theapplicationofsupervisedmachinelearninginorganizationalcommunicationresearch.ComputersinHumanBehavior,63,132-141.Vinodhini,G.,&Chandrasekaran,R.M.(2012).Sentimentanalysisandopinionmining:asurvey.InternationalJournalofAdvancedResearchinComputerScienceandSoftwareEngineering,2(6),282-292.Wiesenberg,M.,Zerfass,A.,MorenoA.(2017)BigDataandAutomationinStrategicCommunication.InternationalJournalofStrategicCommunication11:2,pages95-114.Wilson,T.,Wiebe,J.&Hoffmann,P.(2005).RecognizingContextualPolarityinPhrase-LevelSentimentAnalysis.Proc.ofHLT-EMNLP-2005.RecognizingContextualPolarityinPhrase-LevelSentimentAnalysis.Yeon-Lee,N.,Kim,Y.,Sang,Y.(2017)HowdojournalistsleverageTwitter?ExpressiveandconsumptiveuseofTwitter.TheSocialScienceJournal54:2,pages139-147.

Page 23: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

20

Style,Singularity,andSubstance:WhatPictureEditorsWantfromA.I.

MartinSchön [email protected]

[email protected]

Abstract:Thispaperaimstoassesstheopportunitiesandchallengesinherentintheuseofartificialintelligenceinjournalisticpictureediting.Todothiswebuiltatoolthatsuggestsimagestoillustratenewsarticlesusingkeywordextractionandevaluateditusingaqualitativeonlinesurveyofprofessionalpictureeditorsandasimple,manual“suitability”heuristic.Ourresultsshowthatthetoolisabletoreturnafactually“suitable”imageabouthalfthetime,performingbetteronnationalorinternationalstoriesthanonthosewithalocalorregionalfocus.However,thesurveyofpictureeditorsrevealedthatwhetheranimagematchesastory’stopicisnottheonlycriteriausedinimageselection.Alsoimportantiswhethertheimage:workswithinthespaceallocatedtoitonthepage,matchesthetargetpublication’shousestyle,hasparticularaestheticqualities,andisoriginal—helpingthestoryanditsrespectivepublicationtostandoutfromthecompetition.Thedevelopmentanddeploymentofartificialintelligencetoolsinjournalisticpictureeditingwillneedtoconsiderthesecontextualandartisticissues,aswellastheresistancetoautomationthatsomeprofessionalpictureeditorsexpressedtous.Keywords:Artificialintelligence,ImageSelection,Keywordextraction,Machinelearning,PictureeditingIntroductionInAugust2018,GettyImageslaunched‘Panels’,an“artificialintelligencetool...thatrecommends...visualcontenttoaccompanyanewsarticle.”Thepromisewasthatitcouldhelppictureeditorscreate“betterstories,morequickly”(Getty2018).Thelaunchispartofatrendforartificialintelligencetobeappliedtobotheditorialfunctionswithinnews,aswellastothecreationofstillandmovingimagesmorewidely.Inthispaperwedescribeasystemthatwehavebuiltthatautomaticallyselectsimagesfornewsarticlesandevaluatethatsystemwiththehelpofprofessionalpictureeditorsinorderthatwecanassesstheopportunitiesandchallengesinherentintheuseofartificialintelligenceinjournalisticpictureediting.

MethodsWebuiltasystemthatsuggestsimagestoillustratenewsarticlesusingkeywordextraction.Thetooltakestheplaintextofanewsarticleasinput,returningasearchstringthatisusedtoqueryanimagedatabase.Thetoolranksalltermsthatoccurinthearticleaccordingtothreecriteria.Firstly,termfrequency:thenumberoftimesatermoccurs.Secondly,firstoccurrence:thepositionoftheterm’sfirstmention.Thirdly,entitycategory:asemanticcategorisationofthetermretrievedfromtheThomsonReutersOpenCalaistaggingservice.ExamplesofcategoriesincludeHumanProtagonistandLocation.Therankingisperformedbyafeedforwardneuralnetwork.Thenetworkistrainedtoclassifygoodandbadsearchtermswithasetof100,000termsgeneratedfromourowncorpusof20,000BBCNewsarticles.Thehighestrankedtermsarecompiledintoasearchquery.Inadditiontothismachinelearningapproach,asecondrankingmechanismwasdeveloped.Thisstatisticalapproachcalculatestherankingscoreforeachtermdirectlyfromthetermfrequencyandfirstoccurrencevalues,withoutanypriorlearninginvolved.AdemonstrationofthetoolusingtheGettyImagesAPIisavailableonline(Schön2018).Inordertoevaluatethesystemweusedaqualitativeonlinesurveyofprofessionalpictureeditorsandasimple,manual“suitability”heuristic.Thesurveyusedaconveniencesample(N=25)andaskedpictureeditorsabouttheirworkroutines,withafocusonidentifyingtasksthathadthepotentialtobeautomatedandonhowtheyselectedimagestoillustratearticles.Theeditorswerethengivenanopportunitytouseoursystembyinputtingtextstories

Page 24: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

21

toreceive,automatically,suggestionsforillustrativeimages.Followingtheinteractivedemonstration,therespondentswereaskedtodiscussthesystem’sstrengthsandlimitations,andAI’sfuturepotentialintheirwork.

FindingsOurfirstevaluationusedasimple,manualheuristictodeterminethesystem’sperformanceintermsofthegeneralsuitabilityoftheimagessuggestedforagivennewsstory.Wetestedimplementationsofbothrankingmechanisms—fourdifferentneuralnetworksandonestatisticalapproach—with100articlesfromtheBBCNewscorpus.TheGettyImagesAPIwasusedastheimagedatabase.Theresultingimagesweremanuallyclassifiedaseither“suitable”or“unsuitable”withregardtotherespectivearticle.Animagewasdeemed“suitable”ifitillustratedthemaintopicofthestory,butnojudgementwasmadeontheimage’saestheticqualities.BecauseallimagesreturnedcamefromtheGettyImagesdatabase,theymetGetty’sminimumstandardsforsharpness,exposure,compositionandsoforth.Theevaluationresultsshowthatbothrankingmechanismsperformsimilarlywell,withtheneuralnetworksperformingslightlybetter.Thetoolworkedbetteronarticleswithoutalocalorregionalfocus,suchasnewsaboutinternationalpolitics,technology,scienceorbusiness.Onarticleswithalocalfocus,thestatisticalapproachoutperformedthenetworksbyfar.Firstoccurrenceprovedtobethemostpowerfulcriterionforjudgingthesuitabilityofatermassearchquery.Onaveragearoundhalftheimagesreturnedwereclassifiedas“suitable.”Oursecondevaluationinvolvedaqualitativeonlinesurveyofprofessionalpictureeditors.Mostoftheeditorsselectedimagestoillustratespecificarticlesatleastdaily,butveryfewwereawareof,orhadused,anysoftwarethatcouldhelpautomatetheirroutinetasks.Thosewhohaddidnotcomeawayveryimpressed.Onehadusedsoftwarethatcouldautomaticallycropimagesbutfoundit“restrictive”.AnotherhadusedtheGetty‘Panels’productmentionedintheintroductionbutthoughtitwasdeficient,notbeing“smartorsubtleenough.”Thepictureeditors’feedbackonoursystemwas,onbalance,morenegativethanpositive.Onthepositivesidesomeacknowledgedthatitwascapableofsuggestingsuitableillustrativeimagesquickly.Onesaidtheymightuseit“ifIhadarushonorwasstuckforideas”.Otherssuggesteditmightbeusefulforteamswhodidnothaveadesignated,orexperienced,pictureresearcher.Onthenegativeside,acommoncriticismconcernedthelackofrelevanceofsomeofthesuggestedimages—alimitationour“suitability”heuristicalsohighlighted.Theeditors’surveyrevealedothershortcomingstoo.Someeditorswantedthesystemtobeabletosuggestimagesfromawidervarietyofsources(notjustGetty)andtoshowthecostsoflicencingparticularimages—notinsurmountabletechnicalchallenges.However,someoftheeditors’otherwishespresentmoreofachallenge.Firstly,severalmentionedthatimagesdonotonlyhavetoillustratethecontentofaparticularstorybuttheyneedtobeinkeepingwiththehousestyleofthetargetpublication.Secondly,animagehastobesuitableforthespaceavailableforitonthepage,whichmightprecludeimageswithcertaincompositionsoraspectratios.Thirdly,editorsemphasizedtheimportanceofimages’aesthetics,forexampletheir“beauty”and“visualimpact.”Finally,theimportanceofhaving“original”or“unique”imageswasemphasized,athoughtencapsulatedinthisresponsefromoneofourrespondents:“Iwantsomethingdifferenttowhateveryoneelsehas.”

ConclusionsInthisstudywehavehighlightedsomeoftheworkbeingundertakentoapplyartificialintelligencetothetaskofselectingimagestoillustratestories.Ourownsystemwasdescribed,andwedemonstrated,bothquantitativelyandqualitatively,thatitiscapableofreturningafactually“suitable”imageabouthalfthetime.Withfurtherdevelopmentthisproportionshouldimprove.However,oursurveyofprofessionalpictureeditorsrevealedthat,inthereal-world,itisnotenoughforanimagetomatchthetopicofastory.ForAIimageselectiontools—likeoursandGetty’s‘Panels’—tobeusefultheyneedtodomore.Twointerestingchallengesare,firstly,tomakethetoolscontextuallyaware,bothatthepagelevel(wheretheimagewillappear)andattheoutletlevel(thepublication’shousestyle)and,secondly,toofferselectionsthatareabletofulfileditors’desiretohave“original”imagesthatare“different”totheircompetitors.Twootherrequirementspresentamuchgreaterlevelofchallenge.Firstly,toselectimagesthathavetherightaestheticand

Page 25: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

22

emotionalqualitiesand,secondly,tobuildsystemsthatdonotmakepictureeditorsfeelthatthe“creative”elementofpictureeditingisbeing“takenaway”fromthem.

ReferencesGetty(2018)“GettyImagesLaunchesAITooltoTransformSearchforMediaPublishers”2August,http://press.gettyimages.com/getty-images-launches-ai-tool-to-transform-search-for-media-publishers/Schön,Martin(2018)PicpicExplorer.http://picpic-explorer.argonn.me/#/.Accessed24October2018.

Page 26: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

23

Candatajournalismreallystimulatelocalnews?AcasestudywithmediainthecountrysideofPortugal

RicardoMorais PedroJerónimoUniversityofBeiraInterior/[email protected]

UniversityofBeiraInterior/Labcom.IFP&[email protected]

Abstract:Thisstudyaimsatoutliningthedevelopmentofdatajournalismattwenty-fivePortugueselocalmedia.Theresultspresentedarebasedonasurveyto107journalistsfromlocalnewspapersandradios.Acontentanalysisevidencingdatajournalismpublishedontheirwebsitesisalsopresentedhere,andconfrontsresultsobtainedthroughthesurvey.Themainfindingssuggestthatdespitetheknowledgejournalistshave,thereisnoinvestmentindatajournalismbecausethistypeofpracticeisnotconsideredtobeadeterminingfactorinattractingnewaudiencesandapproachingthemost-assessednews.Consideringthecarried-outanalysis,wepropose,withinthescopeoftheprojectinwhichwedevelopedthisstudy,somepracticesthatlocalmediacanadoptthatcanhelpjournaliststorealizethepotentialofdatajournalism,butaboveall,toencouragethemtoadopttheirpractices.Keywords:Datajournalism;localnews;Portugal;Remedia.Lab

IntroductionInrecentyearsmuchhasbeensaidaboutthepossibilitiesofdatajournalismandhowitcanimprovethejournalisticfield.Butthetruthis,asRogersreveals,“Datajournalismisnotnew”,actually,thefirstexampleofdatajournalismdatesbackto1821,“intheveryfirstGuardian”andisrelatedtoa“listofschoolsinManchesterandSalford,withhowmanypupilsattendedeachoneandaverageannualspending”(Rogers,2013,p.60).Therefore,thiswell-knownformofjournalismhasjustawakenedinrecentyearsbecausesocietiesbecomeincreasinglydigitalandtheamountofinformationavailableonnetworksgrows.Itwaspreciselythisincreaseintheamountofinformationavailablethatmadedatajournalismbecomedeterminantintwolevels:“1)analysistobringsenseandstructureoutofthenever-endingflowofdataand2)presentationtogetwhat’simportantandrelevantintotheconsumer’shead”(MeyerapudGray,ChambersandBounegru,2012,p.6).Atatimewhereinformationiseverywhere,themostimportanttasksarenolongersearchandgather,butfilteringandverification.Theroleofjournalistsis,inthiscontext,particularlysignificant,sincetheyhavethepowertomakesenseofinformation.AsPilhofersays,datajournalism“canincludeeverythingfromtraditionalcomputer-assistedreporting(usingdataasa“source”)tothemostcutting-edgedatavisualizationandnewsapplications”,buttheultimategoalremainsthesame:“providinginformationandanalysistohelpinformusallaboutimportantissuesoftheday”(PilhoferapudGray,ChambersandBounegru,2012,p.6).Inspiteofthat,therearesomelimitationsindatajournalismqualityeveninmajormediacompanies(Young,HermidaandFulda,2018).Importantfortheproductionanddisseminationofinformationinthenewdigitalecosystem,datajournalism“maybethemostpowerfulforumofcollectivejournalisticsensemaking”(Anderson,2019).Thispracticeassumesparticularimportanceincertaincontexts,suchasthelocalone.AsKristenMuller,achiefcontentofficeratKPCC,says,“iflocalnewsroomsaregoingtoachievedigitalsustainability,wemusttrynewthings.Weneedtoexperimentwithdifferentapproachestocoverageandrevenue”(2018).Therefore,proximity,thatisacharacteristicoflocalandregionalmedia,canhaveindatajournalismauniquepossibilityto“findinguniquestories(notfromnewswires),andexecutingthewatchdogfunction”.JerryVermanenbelievesdatajournalismiscrucialtoregionalnewspapers,“becauselocalnewspapershavethisdirectimpactintheirneighborhoodandsourcesbecomedigitalized,ajournalistmustknowhowtofind,analyzeandvisualizeastoryfromdata(Vermanen

Page 27: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

24

apudGray,ChambersandBounegru,2012,p.7).AlsoStefanBack(2018)pointedoutthattheengagementofjournalistsandcivictechnologistscanbechallengingtopublicserviceatalocallevel.Thequestionthenarisesastowhethertheselocalmedia,whereworkingconditionsareoftenscarcesincethenumberofjournalistsislow,understandthepotentialofdatajournalismandarereadytoit.Sometimestheydoandalsohavepeopleinthenewsroomswiththatkindofknowledge(editorialandtechnicalstaff)but,unfortunately,thatisnotembeddedinacompanyoreditorialstrategies(Jerónimo,2015).Dolocaljournalistsseedatajournalismasawaytoscrutinizetheworldandholdthepowersaccountable?Arejournalistsawareofdatajournalismtechniques?Cantheyunderstandbasicskillsfromtraditionaljournalismjustaren’tenoughinadigitalera?Thesearesomeofthequestionsthatweseektoanswerinthispaper,throughananalysisofasetoflocalmediainthecentralregionofPortugal,asignificantpartofthePortuguesemedialandscapeandwherethenumberofmediahasdeclinedinrecentyears,duetothelackofpublicsupportandlowaudiences.

MethodsIntermsofresearchmethods,weoptedforthestrategyofthestudycase,sinceitseemstousasamoreadaptedtoolfortherealitythatweintendtostudy.ForYin(1989)casestudyisempiricalresearchwhichconsistintheanalysisofaparticularphenomenonintherealworld,throughdifferentwaysofcollectingdata.RossmanandRallis(2003)considerthatthecasestudies“seektounderstandthelargerphenomenonthroughacloseexaminationofaspecificcaseandthereforefocusontheparticular”(p.104).Ourcasestudyis,infact,amultiplecasestudy(Yin,1989,p.52),whichwascharacterizedbythefactofperformingindifferentlocalmediaatthesametime.Toanswerourquestions,wecollectdatathroughasurveywithjournalistsfromtwenty-fivelocalnewspapersandradios,butweconductalsoacontentanalysisinsearchofdatajournalismexamples,publishedonthewebsitesofthemediainvestigated.WeactaccordingtotheproposalsofYin(1989),whoadvocatetheuseofdifferentdatasources,i.e.,“multiplesourcesofevidence”(p.23).ThejournalisticprojectschoseninthisstudyaimatrepresentingthecentralareaofPortugal,oneofthemostaffectedatthelevelofmediacommunicationforeclosure.Ontheotherhand,thesemediaarealsopartoftheprojectRemedia.Labinwhichwetry“todiagnosisthecurrentsituationoflocal/regionalmedia,promotingexperimentaltoolsandstrategiestostrengthentheirbusinessmodel,increasingtheirinnovationdegreeandimprovingtheirconnectionwiththepublic”.

FindingsandArgumentResultsfromthesurveyshowthatjournalistsseeasveryimportanthavingknowledgeinweb-scrapping,obtainingtoolsforanalysisanddatacollection,aswellasgainingknowledgeinthecreationofinfographicsanddatapresentation.Furthermore,answerscollectedfromthelocaljournalistssurprisinglyshowthattheseprofessionalshaveagoodknowledgeinasetofskills,suchasweb-scraping,datavisualizationandpresentation.However,findingsshowthatjournalistsdonotconsidertheuseofdatajournalismasthemosteffectiveapproachtoattractnewaudiencesaswellastoretainthem.Alsothelackofcompanyandeditorialstrategiescanhelptodiscouragethiswayofthinking(Jerónimo,2015).Thisisthekindofdatathatweseektoexploreinthiswork,especiallysinceseveralstudiesindicatethatdatajournalismcanhelptorevitalizelocaljournalismingeneral,andmoreparticularly,smalllocaljournalismoffices.Cantheseresultshelpexplainthelimitednumberofjournalismworksbasedondatajournalismwefoundonmediawebsites?If,asthesurveyshows,knowledgeindatajournalismtechniquesexist,isthelackofinvestmentduetoinsufficiencyofhumanandmaterialresources?Thesearesomeoftheresultsthatwewillexploreinthisworktryingtoconfrontthisrealitywiththeamountofinformationthatisnowavailableonthenetwork,butalsoquestioningtheaccessibilityofthedataforjournalists'work.

ConclusionsAlthoughourfindingsshowthatlocalmediajournalistshavetechnicalknowledgetomanagedatajournalism,thoseskillsarenotinvestedinnewsproduction.Suchsituationcanbeexplainedbythelackofstrategies,thepresenceofsmallnewsroomsandatraditionalnewsmakingculture,asitisespeciallyevidentinnewspapers.Ontheotherhand,wecannotignorefindingsofpreviousstudiesthatidentify“keyactors”inthenewsrooms:

Page 28: Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and Computational Journalism Conference! The 3rd European Data and Computational Journalism

25

journalistswiththeabilitytoinnovateintheirfield,ontheirown,evenwithoutrecurringtoexistingstrategies(Jerónimo,2015).Withthiswork,wewouldliketopointoutthatthelackofdatajournalisminPortugueselocalmediaisanopportunityforjournalisticprojectstoassumethatthereareopportunitiesinthejournalisticfieldthatmustbeexploredtocaptureandmaintainaudiences.Identifying,encouragingandhelpingthekeyplayersintheessaysaresomeofthestepsthatneedtobetakennext.Theseresults,togetherwiththeevaluationofthetypeofdatathatpublicservicesandgovernmentsprovide,constituteimportantknowledgethatcanbetransmittedintheformofadvicetothelocalmedia,inordertohelpthemimplementdatajournalismintheiressays.Theseresultswillalsobeimportantforpublicandprivateentitiesatlocalandregionalleveltoopenlydisclosetheirdata.

ReferencesAlexandre,I.A.R.(2014).JornalismodeDados:oestadodaartenosjornaisgeneralistasdiáriosemPortugal.MestradoemNovosMediaePráticasWeb.FaculdadedeCiênciasSociaiseHumanasdaUniversidadeNovadeLisboa.Availableathttps://run.unl.pt/handle/10362/13615[AccessedApril3,2019].Anderson,C.W.(2019).GenealogiesofDataJournalism.In:J.GrayandL.Bounegru,eds.,TheDataJournalismHandbook2:TowardsaCriticalDataPractice.EuropeanJournalismCentreandGoogleNewsInitiative.Availableathttps://datajournalismhandbook.org/handbook/two/situating-data-journalism/genealogies-of-data-journalism[AccessedApril3,2019].Gray,J.andBounegru,L.(Eds.)(2019).TheDataJournalismHandbook2:TowardsaCriticalDataPractice.EuropeanJournalismCentreandGoogleNewsInitiative.Availableathttps://datajournalismhandbook.org/handbook/two#situating-data-journalism[AccessedApril3,2019].Baack,S.(2018).PracticallyEngaged,DigitalJournalism,6(6),673-692,DOI:10.1080/21670811.2017.1375382Gray,J.,Bounegru,L.andChambers,L.(2012).TheDataJournalismHandbook.Howjournalistscanusedatatoimprovethenews.Califórnia:O’ReillyMedia.Availableathttp://datajournalismhandbook.org/[AccessedApril3,2019].Jerónimo,P.(2015).Ciberjornalismodeproximidade:redações,jornalistasenotíciasonline.Covilhã:LabComBooks.Martinho,A.I.P.(2013).Jornalismodedados:contributoparaumacaracterizaçãodoestadodaarteemPortugal.DissertaçãodeMestrado.ISCTE-IUL.Availableathttp://hdl.handle.net/10071/8329[AccessedApril3,2019].Meyer,P.(2002).PrecisionJournalism:areporter’sintroductiontosocialsciencemethods.Maryland:Rowman&LittlefieldPublishers.Rogers,S.(2013).FactsareSacred:ThePowerofData.London:GuardianBooks.Rossman,G.andRallis,S.(2003).LearningintheField:Anintroductiontoqualitativeresearch.ThousandOaks(California):SagePublications.Yin,R.(1989).CaseStudyResearch–DesignandMethods.London:SagePublications.Young,M.L.,Hermida,A.andFulda,J.(2018).WhatMakesforGreatDataJournalism?,JournalismPractice,12(1),115-135.DOI:10.1080/17512786.2016.1270171Young,W.D.(2018).DataJournalismGoesUndercover.Availableathttps://www.niemanlab.org/2019/01/data-journalism-goes-undercover/[AccessedApril3,2019].