on the books: jim crow and algorithms of resistance

13
Amanda Henley, Head of Digital Research Services on behalf of the On the Books Project Team On the Books: Jim Crow and Algorithms of Resistance

Upload: others

Post on 20-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the Books: Jim Crow and Algorithms of Resistance

Amanda Henley, Head of Digital Research Serviceson behalf of the On the Books Project Team

On the Books: Jim Crow and Algorithms of Resistance

Page 2: On the Books: Jim Crow and Algorithms of Resistance

Student Workers: Montana Eck, Julia Long, Ashley Mullikin, Siri Nallaparaju, Tim Oyeleke, and Jenna Patton

Project TeamNeil Byers, Graduate Assistant- Documentation and Content Developer

Lorin Bruckner, Data Visualization Services Librarian - Text Analysis and Visualization Expert

Sarah Carrier, North Carolina Research and Instructional Librarian - Special Collections Expert

Rucha Dalwadi, Research Assistant - Documentation and Content Developer

James Dick, Graduate Assistant (& Attorney)- Law review and QA/QC

María R. Estorino, AUL for Special Collections & Director of the Wilson Library -Executive Sponsor and Liaison to the Library Leadership Team

Grant Glass, Graduate Assistant- Text Analysis workflow

Amanda Henley, Head of Digital Research Services - PI and Project Lead

Hannah Jacobs, Graduate Assistant – Content Developer

Matt Jansen, Data Analyst - Text Analysis Expert and Statistician

Steve Segedy, Applications Analyst – Web developer

William Sturkey, Faculty Member of History - Disciplinary Scholar

Kimber Thomas, African American Studies Scholar

Nathan Kelber, Ithaka – Collaborator (former PI and Project Lead)

Funding

Page 3: On the Books: Jim Crow and Algorithms of Resistance

About On the Books

Project to make North Carolina legal history accessible as a text corpus.

100+ years of North Carolina public, private, and local session laws

Project Goals:

-Create corpus of NC Session Laws from 1865/66 - 1967-Identify discoverable NC segregation statutes during the Jim Crow era using text analysis

Page 4: On the Books: Jim Crow and Algorithms of Resistance

Motivated by a reference question:

Where do I find a list of NC Jim Crow laws?

Page 5: On the Books: Jim Crow and Algorithms of Resistance
Page 6: On the Books: Jim Crow and Algorithms of Resistance

Workflow & ProcessesFor creating Collection as Data

• Compile Volume List

• Download Images from Internet Archive

• Preprocess Images• Identify location of marginalia and paratextual

information• Rotate as needed• Crop image to main text body• Add color-matched borders• Adjust images to optimize OCR

• OCR over 80,000 Images

Marginalia and paratextual information were removed.

Page 7: On the Books: Jim Crow and Algorithms of Resistance

Unit of analysis is individual lawsUsed pattern matching to split lawsExtensive post-split cleanup

Results: • 53,218 chapters• 297,000 sections

Parse and Annotate Laws

Page 8: On the Books: Jim Crow and Algorithms of Resistance

Text Analysis

Can we determine which laws are Jim Crow?

Page 9: On the Books: Jim Crow and Algorithms of Resistance

Requires a training set to teach the algorithm what is/is not a Jim Crow law.

Laws in the training set identified by experts:

• Pauli Murray• Richard Paschal• William Sturkey• Kimber Thomas

Supervised Classification

Page 10: On the Books: Jim Crow and Algorithms of Resistance

• To identify the best model, 80% of the training set was used to train models, while 20% was used to assess precision.

• XGBoost model selected for highest precision.

• Incorporated the type of law (public, private) and the year.

• Output was probability of law being Jim Crow.

• 90% probable Jim Crow cutoff selected (conservative).

Analysis

Page 11: On the Books: Jim Crow and Algorithms of Resistance

Identified 905 Jim Crow Laws

141 identified by experts

411 identified by the model only

353 identified by the model and confirmed by an expert

Page 12: On the Books: Jim Crow and Algorithms of Resistance

Version 2 is Forthcoming • Improved corpus - more accurately split chapters and sections

• Improved text analysis – more advanced workflow• Identified additional Jim Crow laws

• Training set

Page 13: On the Books: Jim Crow and Algorithms of Resistance

onthebooks.lib.unc.edu