groeling, tim: newsscape: preserving tv news

27
NewsScape: Preserving TV News Tim Groeling ([email protected])

Upload: reynolds-journalism-institute-rji

Post on 14-Apr-2017

138 views

Category:

News & Politics


0 download

TRANSCRIPT

Page 1: Groeling, Tim: NewsScape: Preserving TV News

NewsScape: Preserving TV News Tim Groeling ([email protected])

Page 2: Groeling, Tim: NewsScape: Preserving TV News

Who Are We?• Prof. Francis Steen: Director, Communication Studies

Archive

• Me: Leading the analog digitization effort

• Predecessor & Archive Founder: Prof. Paul Rosenthal (emeritus)

• UCLA Library: Helps support the collection, store files, and host main “public” site (tvnews.library.ucla.edu )

• Other supporters: UCLA Chancellor and Dean of Social Sciences, Arcadia Fund, UCLA Social Sciences Computing, the NSF, the California Endowment, UCLA Office for Instructional Development, and UCLA CCLE.

Page 3: Groeling, Tim: NewsScape: Preserving TV News

Collections?• Oldest collection: UCLA Campus

Speakers, from 1950s-1980s (over 500 audio recordings)

• Digitized to coincide with 40th anniversary of my department.

• Originally planned to exhibit & host on our website. Target at alumni.

• Moved to YouTube: Now most traffic (77%) comes from YouTube search/ suggested videos/browsing.

• Issues: commenters & copyright

Page 4: Groeling, Tim: NewsScape: Preserving TV News

NewsScape• Largest collection: TV news and public

affairs programs (local & national)

• Started during Watergate (preserve ephemera). Shoestring budget (until recently, only about $10k per year plus volunteer labor & donated equipment)

• 1979: Started trying to record all the local and national TV news viewable in LA.

• 2006: Started daily straight-to-digital recording.

• Since 2006: Added other cities.

Page 5: Groeling, Tim: NewsScape: Preserving TV News

Pre-2006 Holdings?• Recordings spread across three campus

organizations (Comm Studies, Library, and TFT) and at least four on- and off-campus storage sites.

• Good records for some portions; very poor records for others.

• Not sure how many tapes are in the collection overall.

• Even where we know what should be on a tape, some problems with tape, VCR, or schedule.

Page 6: Groeling, Tim: NewsScape: Preserving TV News

Tapes

• Earliest recordings (1970s) Around 500 U-Matic tapes.

• Middle period (1979-early 1990s): about 50k hours on Betamax

• Late period (1990s-2006): Around 160k hours on VHS, plus some redundancy.

Page 7: Groeling, Tim: NewsScape: Preserving TV News

Preservation• VHS are actually most threatened, despite being newest

tapes.

• Coincided with cable TV expansion of news programming: stretched same budget to cover more news programming.

• 8 hours per consumer VHS tape (Betas and U-matics were higher quality tapes; less recorded on each tape)

• Poor quality consumer-grade VCRs

• Limited spot-checking for quality (failing VCRs or poor signal quality not noticed for long stretches).

• Originals still in hand, but dead.

• Improperly stored (even faculty didn’t have A/C)

Page 8: Groeling, Tim: NewsScape: Preserving TV News

Cost to Digitize?• Got bids from another archive and

commercial providers: $1.5 million (just for first 150k hours of VHS).

• Instead, shoestring again.

• $20k (and some donated surplus machines) for hardware, software, and furniture for digitization lab.

• Run by me, part time lab manager, and 10 work-study students. Steen handles files.

• [Shifting to Betamax will be costly, though]

Page 9: Groeling, Tim: NewsScape: Preserving TV News

Lab Details• 22 digitization stations (VCR, encoder, computer).

3 local RAID file servers and 1 Filemaker Server.

• All computers: surplus or eBay Macs (circa 2008)

• Encoders: EyeTV using Hauppauge 950q or EyeTV Hybrid hardware MPEG-2 encoders (get CC). Export and sync scripted.

• VCRs: After testing, settled on JVC S-VHS VCRs (consumer to pro).

• Use pre-printed barcode stickers for inventory. Custom Filemaker database for tracking digitization attempts and quality control. Filemaker Go Mobile (via cell phones) for asset tracking.

• Files are quality-checked, compressed to h.264, closed captioning extracted via Hoffman Cluster.

Page 10: Groeling, Tim: NewsScape: Preserving TV News

Progress• Fall 2015: Process design &

workstation configuration testing

• Winter 2016: hired students, fixed network issues, and ramped up workstations.

• Summer 2016: Filemaker inventory control; two daily shifts.

• March-Oct 2016: 4.5k tapes encoded (about 36k hours).

• Delaying splitting files into shows.

Page 11: Groeling, Tim: NewsScape: Preserving TV News

Problems: Lots• Recording/playback VCRs out of spec

(solution: quality control aggregation helps find love connection)

• Varying program names over time (database tracking alternate show names; day/time/channel 30-minute bloc)

• Buzzing audio? Computer RF interference with VCR audio. (Used dead VCRs as spacers.)

• Lot of other problems, but in most cases, just means another encoding attempt.

Page 12: Groeling, Tim: NewsScape: Preserving TV News

Post 2006 Straight-to-Digital

• 46 networks (US and beyond)

• 2,525 Series

• Total video files: 383,550

• Duration in hours: 297,596

• Closed caption files: 383,739

• Words in caption files: 2,419,185,351

• OCR files: 371,426

• Words in OCR files: 825,662,597

• Total thumbnail images: 107,134,425

• Storage: 106.93 terabytes

• Limited public access link: tvnews.library.ucla.edu

Page 13: Groeling, Tim: NewsScape: Preserving TV News

Unlocking the Content• Preservation is just first step:

Needs to be more than "world’s best DVR"

• Want to provide tools to make the collection more useful and relevant beyond UCLA: Help people

• Understand TV news, and…

• Share what they find

Page 14: Groeling, Tim: NewsScape: Preserving TV News

Example: Obama• Preservation is just first step: Needs to be more

than world’s "best VCR"

• Want to provide tools to make the collection more useful and relevant beyond UCLA: Help people

• Understand TV news, and…

• Share what they find

Good to know, but…

Page 15: Groeling, Tim: NewsScape: Preserving TV News

Tools to Understand News• Not just view, but analyze

• Help understand and visualize patterns of news coverage, not just individual stories. Forest, not just trees.

• Tools are already being developed, but are complex

Page 16: Groeling, Tim: NewsScape: Preserving TV News

Tools to Understand Text

• Not just view, but analyze

• Help understand and visualize patterns of news coverage, not just individual stories (copyright, too)

• Tools are already being developed, but are complex

Page 17: Groeling, Tim: NewsScape: Preserving TV News

Ambitious Goal: Visuals• Text analysis is fairly mature (more

than 2 billion words in NewsScape index)

• Named entities, parts of speech, topic detection are all working now (sentiment is harder)

• Analysis of visuals is challenging.

• Facial detection & analysis tools are becoming more useful; scalable

Page 18: Groeling, Tim: NewsScape: Preserving TV News

Automated Analysis of Visuals

• Goal: Be able to understand patterns of visual communication in election news. Hard to study.

• Mostly hand-coded & focus on still newspaper or web photos

• Trouble scaling to massive volume of images.

• Subjectivity

• Machine learning and big data as solution

• Presented pilot study at this year’s American Political Science Association conference categorizing presidential candidate faces (smiling or not).

Page 19: Groeling, Tim: NewsScape: Preserving TV News

Face Validity

>17 14 ~ 17 11 ~ 14 8 ~ 11

< -13 -13 ~ -10 -10 ~ -7 -7 ~ -4 -4 ~ -1 -1 ~ 2

2 ~ 5 5 ~ 8

Page 20: Groeling, Tim: NewsScape: Preserving TV News

>17 14 ~ 17 11 ~ 14 8 ~ 11

< -13 -13 ~ -10 -10 ~ -7 -7 ~ -4 -4 ~ -1 -1 ~ 2

2 ~ 5 5 ~ 8

Face Validity

Page 21: Groeling, Tim: NewsScape: Preserving TV News
Page 22: Groeling, Tim: NewsScape: Preserving TV News

Weekly topic tracking (filter by outlet) with

metadata (who, what, where, how much)

Daily topic trajectory. News topics are

detected by clustering every day, and then linked the detected

topics to generate topic tracking trajectories (Li, Joo, Qi, & Zhu, 2015)

Page 23: Groeling, Tim: NewsScape: Preserving TV News
Page 24: Groeling, Tim: NewsScape: Preserving TV News
Page 25: Groeling, Tim: NewsScape: Preserving TV News

Other Goal: Sharing• Help people share what they

learn (within bounds of copyright)

• Solution #1: Share analysis, rather than raw material

• Solution #2: use familiar, copyright-compliant tools to create and share.

Page 26: Groeling, Tim: NewsScape: Preserving TV News

Social Sharing Tools• Trying to develop two tools:

• Animated GIF generator: (short clip; small file; on-screen captioning; easy to play)

• "Supercut" generator: Assemble short examples from archives; share compilation

Page 27: Groeling, Tim: NewsScape: Preserving TV News

Summing Up• Preservation as goal, but also as starting point.

• Excited to be able to understand long-term changes in news content & norms.

• Lot of work ahead of us.

• Appreciate any help or advice (or funding) you can offer.