frontiers of computational journalism - columbia journalism school fall 2012 - week 1

54
Fron%ers of Computa%onal Journalism Columbia Journalism School Week 1: Basics September 10, 2012

Upload: jonathan-stray

Post on 27-Oct-2014

148 views

Category:

Documents


0 download

DESCRIPTION

Course blog at compjournalism.com

TRANSCRIPT

Page 1: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Fron%ers  of    Computa%onal  Journalism  

Columbia  Journalism  School    

Week  1:  Basics  September  10,  2012  

     

Page 2: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Week  1:  Basics  

 What  is  computa%onal  journalism?  

 Data  in  journalism  

 Aims  of  the  course  

 Course  structure  

   

   

Page 3: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Week  1:  Basics  

 What  is  computa%onal  journalism?  

 Data  in  journalism  

 Aims  of  the  course  

 Course  structure  

   

   

Page 4: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Computa%onal  Journalism:  Defini%ons  

“Broadly  defined,  it  can  involve  changing  how  stories  are  discovered,  presented,  aggregated,  mone%zed,  and  archived.  Computa%on  can  advance  journalism  by  drawing  on  innova%ons  in  topic  detec%on,  video  analysis,  personaliza%on,  aggrega%on,  visualiza%on,  and  sensemaking.”      -­‐  Cohen,  Hamilton,  Turner,  Computa(onal  Journalism  

Page 5: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Computa%onal  Journalism:  Defini%ons  

“Stories  will  emerge  from  stacks  of  financial  disclosure  forms,  court  records,  legisla%ve  hearings,  officials'  calendars  or  mee%ng  notes,  and  regulators'  email  messages  that  no  one  today  has  %me  or  money  to  mine.  With  a  suite  of  repor%ng  tools,  a  journalist  will  be  able  to  scan,  transcribe,  analyze,  and  visualize  the  paRerns  in  these  documents.”      -­‐  Cohen,  Hamilton,  Turner,  Computa(onal  Journalism  

Page 6: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Cohen  et  al.  model  

Data   Repor%ng  

User  

Computer  Science  

Page 7: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 8: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 9: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 10: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 11: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 12: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

CS  for  presenta%on  /  interac%on  

Data   Repor%ng  

User  

CS  CS  

Page 13: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 14: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 15: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 16: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Filter  many  stories  for  user  

User  

Data  Repor%ng  

CS  

Data  Repor%ng  

CS  

Data  Repor%ng  

CS  

Filtering  

CS  CS  

CS  

CS  

Page 17: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

•  What  an  editor  puts  on  the  front  page  •  Google  News  •  Reddit’s  comment  system  •  TwiRer  •  Facebook  news  feed  •  Techmeme  •  …  

Examples  of  filters  

Page 18: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Memetracker  by  Leskovic,  Backstrom,  Kleinberg    

Page 19: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Kony  2012  early  network,  by  Gilad  Lotan  /  Socialflow  

Page 20: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Track  effects  

User  

Data  Repor%ng  

CS  

Data  Repor%ng  

CS  

Data  Repor%ng  

CS  

Filtering  

CS  CS  

CS  

CS  

Effects  

CS  

Page 21: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Computa%onal  journalism  process  

 Repor%ng  

Presenta%on  Filtering  Tracking  

 

Page 22: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Computa%onal  Journalism:  Defini%ons  

“the  applica%on  of  computer  science  to  the  problems  of  public  informa%on,  knowledge,  and  belief,  by  prac%%oners  who  see  their  mission  as  outside  of  both  commerce  and  government.”      -­‐  Jonathan  Stray,  A  Computa(onal  Journalism  Reading  List  

Page 23: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Week  1:  Basics  

 What  is  computa%onal  journalism?  

 Data  in  journalism  

 Aims  of  the  course  

 Course  structure  

   

   

Page 24: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

 a  collec%on  of  similar  pieces  of  

informa%on  

Defini%on  of  data  

Page 25: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 26: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 27: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

structured  data  

Page 28: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

unstructured  data  

Page 29: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 30: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Why  use  data  in  journalism?  

1.  data  is  where  the  informa%on  is    

Page 31: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

More  video  on  YouTube  than  produced  by  TV  networks  during  en%re  20th  century  

Page 32: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

10,000  legally-­‐required  reports  filed  by  U.S.  public  companies  every  day  

Page 33: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

400,000,000  tweets  per  day    

AP  moves  ~15,000  stories  per  day    

390,000  Wikileaks  cables    

500,000  Enron  emails      

…how  many  gov’t    and  corporate  docs?    

Page 34: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

There’s  a  lot  out  there    

Human  data  generated  in  2010  =  

1,000,000,000  terabytes    

Library  of  congress  digital  archive  =    

160  terabytes  (only  20  TB  for  all  books!)  

 

All  New  York  Times  ar%cles  ever  =  

0.06  terabytes  (13  million  stories,  assuming  5k  per  story)  

     

Page 35: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Transparency  means  nothing  if  no  one  is  watching.  

Page 36: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Why  use  data  in  journalism?  

1.  Data  is  where  the  informa%on  is  2.  Data  can  give  a  more  complete  picture  

 

Page 37: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 38: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 39: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 40: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1
Page 41: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Phil  Meyer,  Detroit  Riots,  1967  

“A  reporter,  talking  to  people  on  the  street  corner,  draws  comparisons  intui%vely,  almost  unconsciously.  When  dealing  with  large  numbers  of  people—437  were  interviewed  in  the  Detroit  survey—intui%on  is  not  enough.  It  takes  a  computer  to  count  and  sort  and  analyze  the  thoughts  of  that  many  people,  and  the  input  must  be  consistently  structured.”  

Page 42: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Phil  Meyer,  Detroit  Riots,  1967  

   

“Educa%on  and  income  were  not  good  predictors  of  whether  a  person  would  riot.”  

Page 43: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Week  1:  Basics  

 What  is  computa%onal  journalism?  

 Data  in  journalism  

 Aims  of  the  course  

 Course  structure  

   

   

Page 44: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Design  

“[Designers]  are  guided  by  the  ambi%on  to  imagine  a  desirable  state  of  the  world,  playing  through  alterna%ve  ways  in  which  it  might  be  accomplished,  carefully  tracing  the  consequences  of  contemplated  ac%ons.”    

   -­‐  Horst  RiRel,  The  Reasoning  of  Designers  

Page 45: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Design  is  not  objec%ve  “During  the  industrial  age,  the  idea  of  planning,  in  common  with  the  idea  of  professionalism,  was  dominated  by  the  pervasive  idea  of  efficiency.  We  have  come  to  think  about  the  planning  task  in  very  different  ways  in  recent  years.  We  have  been  learning  to  ask  whether  what  we  are  doing  is  the  right  thing  to  do.    That  is  to  say,  we  have  been  learning  to  ask  ques%ons  about  the  outputs  of  ac%ons  and  to  pose  problem  statements  in  valua%ve  frameworks.          

   -­‐  Horst  RiRel,  Dilemmas  in  a  General  Theory  of  Planning  

Page 46: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Design  is  poli%cal  

“No  plan  has  ever  been  beneficial  to  everybody.  Therefore,  many  persons  with  varying,  oten  contradictory  interests  and  ideas  are  or  want  to  be  involved  in  plan-­‐making.  The  resul%ng  plans  are  usually  compromises  resul%ng  from  nego%a%on  and  the  applica%on  of  power.  The  designer  is  party  in  these  processes;  he  takes  sides.”    

   -­‐  Horst  RiRel,  The  Reasoning  of  Designers  

Page 47: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Different  kinds  of  knowledge  

 Norma%ve:  “what  should  be”  

(poli%cal  philosophy,  sociology,  ethics,  cri%cal  theory…)  

 Instrumental:  “how  to  get  there”  

(in  our  case:  journalism  and  computer  science)  

 This  course  is  about  both.    

Page 48: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Week  1:  Basics  

 What  is  computa%onal  journalism?  

 Data  in  journalism  

 Aims  of  the  course  

 Course  structure  

   

   

Page 49: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Theory  

We  will  learn  important  guiding  principles  about    •  Filter  design  •  Visualiza%on  •  Social  network  analysis  •  Drawing  conclusions  from  data  •  Security  modeling  

   

     

Page 50: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Techniques  We  will  discuss  a  handful  of  techniques  in  great  depth.    •  Distance  func%ons  and  clustering  •  Vector  space  document  model  •  Recommender  systems  •  Proposi%on  extrac%on    •  Knowledge  representa%on  as  linked  data  •  Community  detec%on  

Any  requests?      

     

Page 51: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Course  structure  

•  Classes:  we’ll  review  the  readings  (so  please  read  them)  

•  By  next  week:  form  groups  of  2-­‐3.    •  Assignments  every  other  week,  due  in  two  weeks  

•  Some  involve  will  involve  coding,  all  will  involve  cri%cal  analysis.    

   

   

Page 52: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Your  data  

•  You  are  encouraged  to  pick  a  data  set  and  s%ck  with  it.  

•  If  you  want,  can  do  all  assignments,  final  research  report,  etc.  with  this  data  

•  This  is  a  research  course…  let’s  learn  something  new.  

   

   

Page 53: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

What  data?  

SEC  reports,  municipal  open  gov  data,  Wikileaks,  your  favorite  archive,  social  media…    

 Two  criteria:  

 Journalis%cally  interes%ng  

Requires  advanced  techniques      

Page 54: Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 1

Final  Report  

For  3-­‐point  students    •  A  theore%cal  discussion  (10  pages)    For  6-­‐point  students,  one  of:  •  A  theore%cal  discussion  (25  pages)  •  An  implementa%on  of  a  technique  and  discussion  of  results  

•  Analysis  of  your  chosen  data  •  A  completed  story,  plus  methodology