web mining & open source intelligence · • internet revolution – 1.5 billion users, 2x1020...
TRANSCRIPT
1AAAS Meeting Feb 16 2008
Web Mining & Open Source IntelligenceC. H. Best
European CommissionJoint Research Centre
Institute for Protection and Security of the Citizenhttp://www.jrc.ec.europa.eu
http://ses.jrc.it
2AAAS Meeting Feb 16 2008
Motivation
• Internet revolution – 1.5 billion users, 2x1020 bytes.New OS Intelligence Applications– Live Media Monitoring – EU : 23 Languages– Situation Monitoring (UN, AU,US, EC)
– Conflict early warning, Crisis response, Natural Disasters – Disease Control
– Early Warning of disease outbreaks– Counter Terrorism, Law Enforcement
–Propaganda, Radicalisation, Recruitment, Fraud– Business Intelligence
–Markets, Competitors
3AAAS Meeting Feb 16 2008
RNSRapid news service
EMMEuropean Media Monitor
News Explorer Entities
News Tracking -Timelines
All Europe’s newsBreaking News25,000,000 articles Processed since May 2002
EMM in active use by EC• 4000 email alerts/day• 50000 active web users• 500 SMS/day to VIPs• 10000 keywords Real-Time• 35000 articles/day• 35 languages• 600 topic alerts Real-time
EMM in active use by EC• 4000 email alerts/day• 50000 active web users• 500 SMS/day to VIPs• 10000 keywords Real-Time• 35000 articles/day• 35 languages• 600 topic alerts Real-time
Monitors all World CountriesDerives statistical indicators and time trends
Live newsNewslettersPush SMS alertsPress Reviews
Europe Media Monitor Services
4AAAS Meeting Feb 16 2008
Long Term News Tracking
5AAAS Meeting Feb 16 2008
Automatic Person News Tracking in multiple languages
AutomaticContact networks
6AAAS Meeting Feb 16 2008
A Polish examplehttp://press.jrc.it/NewsExplorer/entities/pl/20240.html
7AAAS Meeting Feb 16 2008
NewsExplorer – cross-lingual cluster linking
8AAAS Meeting Feb 16 2008
Social Networks - Relation Extraction
• Two entities are Related or “linked” through a phrase• Machine Learning of relations.
• “contacts” (met, phoned, discussed with, emailed etc.)• “support” (backed, applauded, welcomed, concurred etc.)• “criticise” (slamed, rejected, criticised, accused, etc.)• “family” (wife, son, daughter, lover, mistress etc)
• Mine 3 years of news reports • Log dates and related topics
9AAAS Meeting Feb 16 2008
Social NetworkOf Contacts DuringLebanon Conflict
10AAAS Meeting Feb 16 2008
EMM: Mining Social NetworksKey: Contact, family,support,criticise,generic
11AAAS Meeting Feb 16 2008
Event Extraction and Knowledge Bases
• Automatically determine “who did what to whom where and when” from unstructured text.
• Machine Learning Technique used and applied to news clusters 2005-2007.
12AAAS Meeting Feb 16 2008
Violent Event Processing Chain
Take the title and the firstsentence
Pattern Library
News Cluster Selection
Event Aggregation
News Cluster
Event DescriptionDate:Place:Event type:Number killed:Number wounded:Number kidnapped:Perpetrators:Description of the victims: Weapons:
Keyword Library
Pattern & KeywordMatching
13AAAS Meeting Feb 16 2008
Real Time Violent Event Detection
Real time clustering every 10 minutes
geocoding
Live updatedmap display
event extraction
14AAAS Meeting Feb 16 2008
EMM-Labs: Detection and visualisation of violent events
15AAAS Meeting Feb 16 2008
Extraction
Modelling
Browsing
EMMClusters
ExtractionPatterns
Extracted Events
Extracted Events OntologyInstances
Visualize
Querying, Browsing and Visualization of the Knowledgebase
Ontology Modelling and Knowledgebase Population
Violent Event Extraction
Knowledge Extraction
16AAAS Meeting Feb 16 2008
MediSys: Monitoring Health Threats
17AAAS Meeting Feb 16 2008
Normal GoogleAdvanced Search
Extracting fullTexts from allWeb pages
Desktop Web Mining Tool
18AAAS Meeting Feb 16 2008
19AAAS Meeting Feb 16 2008
Summary
• Web Mining applications in operational use for • Situation/Crisis Monitoring• Law enforcement and counter-terrorism• Medical Intelligence
• Future Technical Challenges• Knowledge extraction• Audio-Visual Monitoring• Small Signal Detection• Multi-linguality
20AAAS Meeting Feb 16 2008
Thank You