center for content extraction · top secret//comint/irel to usa, a us, can, gbr, nzl//20320108...
TRANSCRIPT
![Page 1: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements](https://reader034.vdocuments.net/reader034/viewer/2022042123/5e9ed0ae17c8ca396a26f993/html5/thumbnails/1.jpg)
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL
Human Language Technology _
., ... '\... -IV . ..
Center for Content Extraction
Content Extraction Analytics SIGDEV End-to-End Demo
21 May 2009
Derived From: NSA/CSSM 1-52 Dated: 20070 1 08
Declassify On: 203301 08
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL
![Page 2: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements](https://reader034.vdocuments.net/reader034/viewer/2022042123/5e9ed0ae17c8ca396a26f993/html5/thumbnails/2.jpg)
TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108
Introduction to Content Extraction
• New technologies can find Essential Elements of Information in documents
The Center for Content Extraction provides "one stop shopping" for these technologies at NSA
TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108
![Page 3: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements](https://reader034.vdocuments.net/reader034/viewer/2022042123/5e9ed0ae17c8ca396a26f993/html5/thumbnails/3.jpg)
TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108
Extraction can benefit SIGDEV from end to end
Selection lira1nslation & Transliteration Analysis
II I 1nter1pretation/Enrichment Retrieval Storage & Distribution
TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108
![Page 4: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements](https://reader034.vdocuments.net/reader034/viewer/2022042123/5e9ed0ae17c8ca396a26f993/html5/thumbnails/4.jpg)
TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108
STAIRS Partners
5 (Marina, CEA)
T (Cybertrans)
A (SNA/Paintball, Synapse)
I (Nymrod,Thundercloud)
R (Journeyman/CPE)
5 (GoldenRetriever, SocioPath)
TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108
![Page 5: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements](https://reader034.vdocuments.net/reader034/viewer/2022042123/5e9ed0ae17c8ca396a26f993/html5/thumbnails/5.jpg)
TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108
Imple1mentation: CCE Extraction Architecture (Lex Hound)
Subscription Based Customers - extracted
report/transcript content
Marina (comms tracking) Synapse/EKS (link analysis) Nymrod (Name Matching)
Web Service On
Demand Customers
L WebServices)JJ
LexHound Web Demo CAMT (translation) TKB (target knowledge base) SNA (social network analysis) GIS ( geo mapping) NTOC (terror cell tracking) Heresyitch (UC collateral) GoldenRetriever (record building)
I
------------------------------, Reports _.
Transcri~
1
Ingester
Dispatcher
Task Manager
/
\ \ '\ '" \ \ \ ' \ \ ' ' . ~......_ __ _
...,..__..{ \ ~tractor(s) II •• ' • . ' ' '
·: trc)~former I \
' • • ' _I • I I R~derer
' I
'
+- -------1 I Sender I I
l ~--------- Output ---- ..
TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108
![Page 6: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements](https://reader034.vdocuments.net/reader034/viewer/2022042123/5e9ed0ae17c8ca396a26f993/html5/thumbnails/6.jpg)
TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108
Elaboration: The Central Importance of Storage
D Each of the STAIRS Steps exploits stored i1nformation • Selection Dictionaries ("get it")
• Linguistic Glossaries for Translation
• Wikis etc for enrichment ("know it")
D Ma1nual record-formation is slow, prone to 01missio1ns and inconsistencies • <200K Person Ta rgets in TKB
• Growth rv = 20K/year
D Auto1matic extraction accelerates storage • >3000K Citation Records in Nymrod Entity DB
• Growth rv = lOOOK/year
TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108
![Page 7: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements](https://reader034.vdocuments.net/reader034/viewer/2022042123/5e9ed0ae17c8ca396a26f993/html5/thumbnails/7.jpg)
TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108
Machine vs. Manual Chief-of-State Citations
Nymrod (machine-extracted) Citations LastTKB
Cod Manual
Name Role Cites Update A
Malaysian Prime 10/15/200 1 Abdullah Badawi Minister cos > 100 7
2 Abdullahi Yusuf Somali President cos > 300 N/A
(Mah mud 'Abbas) PA 3 Abu Mazin President cos >200 5/20/2009
4 Alan Garcia Peruvian President cos > 100 N/A
5 Aleksandr Lukashenko Belarusian President cos >50 N/A
6 Alvaro Golom Guatemalan President cos >200 N/A
7 Alvaro Uribe Colombian President cos >700 N/A
8 Amadou Toumani Toure Malian President cos >50 N/A
9 Angela Merkel German Chancellor cos > 300 N/A
10 Bashar ai-Asad Syrian President cos > 800 N/A
... ........................... ... .... ... .. ....... .... .. .. . ..
122 Yuliya Tymoshenko Ukrainian Prime cos >200 N/A
TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108
![Page 8: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements](https://reader034.vdocuments.net/reader034/viewer/2022042123/5e9ed0ae17c8ca396a26f993/html5/thumbnails/8.jpg)
"\:1 ~
~
£:::1
~ p
</' V7 \:7" C;::..
Hwnan Language Technology "•' ~ f •