unclassified 1 searching for the quantifiable, scalable, verifiable, and understandable quantitative...
TRANSCRIPT
UNCLASSIFIED
UNCLASSIFIED
1
Searching for the Quantifiable, Scalable, Verifiable, and Understandable
Quantitative Methods in Defense of National Security, 25 May 2010
Dewey Murdick, Ph.D.Program Manager
25 May 2010
UNCLASSIFIED
UNCLASSIFIED
Intelligence Advanced Research Projects Activity
(IARPA)Dr. Lisa Porter
Director, IARPA
Dr. Peter HighnamOffice Director, Incisive Analysis
Dr. Pete HaalandOffice Director, Safe & Secure
Operations
Dr. Ed BaranoskiOffice Director, Smart Collection
25 May 2010 2
UNCLASSIFIED
UNCLASSIFIED
Overview
This is about taking real risk.
– This is NOT about “quick wins”, “low-hanging fruit”, “sure things”, etc. CAVEAT: HIGH-RISK/HIGH-PAYOFF IS NOT A FREE PASS FOR STUPIDITY.
– Competent failure is acceptable; incompetence is not. “Best and brightest”.
– World-class PMs.o IARPA will not start a program without a good idea and an exceptional person to
lead its execution.
– Full and open competition to the greatest possible extent. Cross-community focus.
– Address cross-community challenges
– Leverage agency expertise (both operational and R&D)
– Work transition strategies and plans
IARPA’s mission is to invest in high-risk/high-payoff research programs that have the potential to provide the U.S. with an overwhelming intelligence
advantage over our future adversaries
25 May 2010 3
UNCLASSIFIED
UNCLASSIFIED
The “P” in IARPA is very important
Technical and programmatic excellence are required
Each Program will have a clearly defined and measurable end-goal, typically 3-5 years out.
– Intermediate milestones to measure progress are also required
– Every Program has a beginning and an end
– A new program may be started that builds upon what has been accomplished in a previous program, but that new program must compete against all other new programs
This approach, coupled with rotational PM positions, ensures that…
– IARPA does not “institutionalize” programs
– Fresh ideas and perspectives are always coming in
– Status quo is always questioned
– Only the best ideas are pursued, and only the best performers are funded.
25 May 2010 4
UNCLASSIFIED
UNCLASSIFIED
The “Heilmeier Questions”
1. What are you trying to do?
2. How does this get done at present? Who does it? What are the limitations of the present approaches?
– Are you aware of the state-of-the-art and have you thoroughly thought through all the options?
3. What is new about your approach? Why do you think you can be successful at this time?
– Given that you’ve provided clear answers to 1 & 2, have you created a compelling option?
– What does first-order analysis of your approach reveal?
4. If you succeed, what difference will it make?– Why should we care?
5. How long will it take? How much will it cost? What are your mid-term and final exams?
– What is your program plan? How will you measure progress? What are your milestones/metrics? What is your transition strategy?
25 May 2010 5
UNCLASSIFIED
UNCLASSIFIED
The Three Strategic Thrusts (Offices)
Smart Collection: dramatically improve the value of collected data– Innovative modeling and analysis approaches to identify where to look
and what to collect.– Novel approaches to access. – Innovative methods to ensure the veracity of data collected from a
variety of sources. Incisive Analysis: maximizing insight from the information we collect, in
a timely fashion– Advanced tools and techniques that will enable effective use of large
volumes of multiple and disparate sources of information.– Innovative approaches (e.g., using virtual worlds, shared workspaces)
that dramatically enhance insight and productivity.– Methods that incorporate socio-cultural and linguistic factors into the
analytic process.– Estimation and communication of uncertainty and risk.
Safe and Secure Operations: countering new capabilities of our adversaries that could threaten our ability to operate effectively in a networked world– Cybersecurity
o Focus on future vulnerabilitieso Approaches to advancing the "science" of cybersecurity, to include the
development of fundamental laws and metrics
– Quantum information science & technology
25 May 2010 6
UNCLASSIFIED
UNCLASSIFIED
Program Manager Interest Areas by Office
7
smart collection
safe and secure operations
incisive analysis
20 April 201025 May 2010
UNCLASSIFIED
UNCLASSIFIED
Concluding Thoughts on IARPA
Technical Excellence & Technical Truth
– Scientific Method
– Peer/independent review
– Full and open competition
We are looking for outstanding PMs.
How to find out more about IARPA:
www.iarpa.gov
25 May 2010 8
UNCLASSIFIED
UNCLASSIFIED
9
Conference on Technical Information Discovery, Extraction & Organization– Mark Heiligman, IARPA PM, Mile-wide, Mile-deep (M2) Exploration– Held October 28-29, 2008, consisted of talks, breakout sessions, and open discussion– Attended by 30+ researchers, business intelligence, and government participants
Facilitated an open and active discussion on current methods, challenges, and opportunities in:– Information Retrieval– Text Processing– Knowledge Discovery– Information Extraction– Social Network Analysis– Scientometrics– Information Visualization and – Closely related research domains
Goal: Drive technical innovation and explore novel applications in the area of systematically mining the global technical literature for useful and non-obvious information and insights
25 May 2010
This talk is a personal summary of the materials presented and discussed at the conference.
UNCLASSIFIED
UNCLASSIFIED
M2 Information Content
Formal Presentations– Mile-wide, Mile-deep, Mark Heiligman, IARPA– Information Retrieval, Scientometrics/Text Mining,and Literature-related Discovery
and Innovation, Ron Kostoff, MITRE– From Knowledge Mapping to Innovation Evolution, Hsinchun Chen, University of
Arizona– Machine Learning for Extraction, Integration and Mining of Research Literature,
Andrew McCallum, University of Massachusetts Amherst– Information Retrieval:The Path Ahead, Jamie Callan, Carnegie Mellon University– Sentiment Analysis from User Forums, Ronen Feldman, Hebrew University– The Accuracy of a Map of Science: Measurement & Implications, Richard Klavans,
SciTech Strategies, Inc– Document Classification Using Nonnegative Matrix Factorization, Michael W.
Berry, University of Tennessee, Knoxville Breakout Sessions & Open Discussion – richest idea content, and biggest contribution
to what follows MITRE Summary:
– A Two-step Analytic-workshop Process For Identifying Promising Research Opportunities, by Ronald Kostoff et al.
25 May 2010 10
UNCLASSIFIED
UNCLASSIFIED
Problems
Too Much Data / Diversity– Scale– Textual / Multimedia– Multilingual– Multiple Sources
Too Complex– Motivation (Create / Disseminate)– Topics / Domains (# / Connectedness)– Shared Intentionally or Not
Too Fast – Streaming
Example for Technical Topics:Scientific Literature, Patents, Conference Proceedings, Talks, Technical Blogs, S&T News, Social Media, Experimental Data, Computational Models / Code, Forecasts, Corporate Filings, Government Funding, Policy, Public Opinion, etc.
1125 May 2010
UNCLASSIFIED
UNCLASSIFIED
Weak Signals in Context
Find weak signals
Use weak signals within context for– Finding connections– Anomaly detection/rare events– Cultural meaning / implications
Manage uncertainty
Development new standards for “ground truth”
1225 May 2010
UNCLASSIFIED
UNCLASSIFIED
Automated Connection Making / Knowledge Discovery Iterative information retrieval (IR), extraction (IE), and linkages
identification Leveraging previous relevancy judgments and feedback Probabilistic linking of subjective qualities within text
Goal: find high-value, low-signature information in context
Connecting Weak Signals
13
Material processing method X may be interesting for property Y
Intriguing Rumors, Uncertain Source
Analyst Analyst Analyst w/
Quantitative System
!
25 May 2010
UNCLASSIFIED
UNCLASSIFIED
Enhancing Contextual Awareness
Automatically– Leverage element characteristics in connection building process– Focused information augmentation from secondary sources– Characterize and apply to analogous situations
o Network Behaviors and Featureso Assessments of subjectivity (e.g., theme, sentiment)
Goal: rapidly inform non-experts with context about a given area/issue
14
www
Context
S&T LiteratureWhere does this
nugget of
information fit?
Analyst25 May 2010
UNCLASSIFIED
UNCLASSIFIED
Identifying Outliers, Rare Events
Automatically– Measuring and analyzing low-frequency indicators in group trends– Systematically identifying anomalies from records of interest and early-stage
emerging technologies – Identifying rare events based on non-technical phrase association patterns– Extracting technical phrases of interest by targeting non-technical phrases such
as sentiment, analysis, stylistics, etc.– Intelligent clustering techniques
Goal: Identify significant rare events
15
Bank statements
Is Jim doing something illegal?
Analyst
25 May 2010
UNCLASSIFIED
UNCLASSIFIED
Collaboration (Two Different Kinds)
Common playground facilitating:– Large-scale data sharing– Data discovery annotation– Error corrections– Multi-source integration– Recall of what has been done in the past
Measure collaboration– Recognize cultural differences– Discover key players – Process changes over time
1625 May 2010
UNCLASSIFIED
UNCLASSIFIED
Multilingual Methods
17
Need algorithms that can process, filter, and analyze multilingual data
Leverage domain-specific machine translation
Compare and contrast translated and multilingual data for improvements in queries, trends, etc.
Language translation is high cost
Translation is not enough to understand meaning in non-English text
Cultural information helps to understand social landscape, motivation, and production of scientists in S&T
25 May 2010
UNCLASSIFIED
UNCLASSIFIED
No Black Boxes
No Algorithm black boxes– Shared environment for algorithm development– Success verifiable through indicator metrics– Output must be humanly comprehensible
Human comprehension metrics:o Number of potential associationso Number of dimensions simultaneously analyzedo Steps to finding informationo Amount of time to digest informationo Amount of information at timeo Efficiency of user-driven tuning of level-of-detail
Algorithmic output exportable to interactive tools
1825 May 2010
UNCLASSIFIED
UNCLASSIFIED
User-Friendly Displays for Data Analysis
Interactive and multifaceted views of scientific landscape– Geo-location– Entity Networks– Topical Networks
Environments that provide both contextual awareness and visualizations– Contextual information
(Wikipedia style) provided when user encounters unfamiliar term or concept
Interactive interfaces to pull out information
1925 May 2010
UNCLASSIFIED
UNCLASSIFIED
Metric Validation Processes
User studies and human labeling to verify data in information extraction(IE) and NLP is costly
Use hybrid methods (e.g., boosting)
Leverage automatically processed information from a external source to validate output
Automating identification of trusted sources to help validation process
Validate results with historical studies, knowledge of current state, and forecasts
20
Serious Need for Novel Thinking
25 May 2010
UNCLASSIFIED
UNCLASSIFIED
Things to Remember
Track Uncertainty – Indicator metrics– Weak signals
No black boxes– Human comprehensible output
Provide clear view of evaluation metrics– Gold standards – Ground truth
2125 May 2010