context knowledge management for armament safety
DESCRIPTION
Context Knowledge Management for Armament Safety. Stuart Madnick, Lynn Wu MIT Sloan School of Management {smadnick, linwu}@mit.edu. Information Integration & Re-Use Projects Stuart Madnick ([email protected]):. Context Knowledge Management Approach to “Armament Safety Management”. - PowerPoint PPT PresentationTRANSCRIPT
1
Context Knowledge Management for Armament Safety
Stuart Madnick, Lynn Wu
MIT Sloan School of Management{smadnick, linwu}@mit.edu
2
Information Integration & Re-Use Projects Stuart Madnick ([email protected]):
Technologies Applications
Strategy, Policy & Legal Issues
Security
COntextINterchange
(COIN) (1)
Financial Services(account aggregation)
Security Analysis
Military Logistics
System DynamicsModeling ofState Stability (4)
StakeholderPerceptions ofSecurity (2)
Economic modelof alternatives toEU DatabaseDirective (3)
RFID ITInfrastructure
DataQuality
Total Data Quality (TDQM) Program (5)
MIT InformationQuality (MIT-IQ)Program
Pros and consOf data standards
Others …
Context Knowledge Management Approach to“Armament Safety Management”
3
Data bases
Appli- cations
OUTPUT PROCESSING
ODBC Driver
Web - Publishing
CONTEXT MEDIATION* Automatic Automatic conflict conflict detection detection and and conversionconversion- Derived data
- Source selection
- Source attribution
TRUSTED
AGENTS
INPUT PROCESSING
* Automatic web wrapping
- Semi-- Semi-structured structured texttext
-Multi--Multi-source source query plan query plan and and executionexecution
Browsers APPLICATIONS: Financial services,
electronic commerce, asset visibility, in-transit visibility.
Sources
Web Pages
Receivers
COntext INterchange (COIN) Project
4
Key COIN Technologies Web Wrapper
Extract selected information from web (HTML+XML) Allows web to be treated as large relational SQL database Can handle dynamic web sites, cookies, “login”, etc. Performs SQL Joins & Unions involving DB’s + Web sources
Context Mediator Resolve semantic (meaning) differences
Enable meaningful aggregation & comparison
5
Context: Multiple Perspectives . . . old lady or young lady ?
6
CONTEXT VARIATIONS:- GEOGRAPHIC ( US vs. UK )
- FUNCTIONAL (CASH MGMT vs. LOANS )
- ORGANIZATIONAL ( CITIBANK vs. CHASE )
Context Context
Context
Data: Databases Web data E-mail
?$ £
¥
Role Of Context05-06-07
07-06-05
06-05-07
7
Types of Context
Representational Ontological
Temporal
Example Temporal
Representational Currency: $ vs € Scale factor: 1 vs 1000
Francs before 2000, € thereafter
Ontological Revenue: Includes vs excludes interest
Revenue: Excludes interest before 1994 but incl. thereafter
8
The 1999 Overture
Unit-of-measure mixup tied to loss of $125Million Mars Orbiter
“NASA’s Mars Climate Orbiter was lost because engineers did not make a simple conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . .
. . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the contractor ).”
Source: Kathy Sawyer, Boston Globe, October 1, 1999, page 1.
9
Context Knowledge Management for Armament Safety Motivation
• Context Knowledge Management is an important challenge
• Semantic inconsistency is present in databases even in the military.
– For example, what does accident rate really mean?• Army Ground Accident Rate: # accidents/period-of-time
1. Per year2. Per month3. Per total actual personnel strength4. Per operational personnel strength
• How do we address such semantic inconsistencies?– How do we interpret different accident rates?– Need context knowledge management
10
Motivating Example
Disclaimer: The data above are artificial and is used to for demonstration only
In the military, there are many ways to measure safety.
1. Accident and injury rate can be measured in per week, per month or per year basis.
2. Nuclear testing data generally uses U.S. Customary measurement system, since most of the nuclear testing has been done in the US. To conform with international standards, the US government has been slowly trying convert the units to metric system. However, even with the metric system, there is a confusion between SI units and non SI units.
2500
Nuclear Test Safety Exclusion Zone
1
Radioactivity
77
Injury Rate
A123
Weapon
0.01
Accident Rate
Unit A
762
Nuclear Test Safety Exclusion Zone
170
Injury Rate
A123
Weapon
3.7 x 10^100.52
RadioactivityAccident Rate
Unit B
0.1/week↔0.52/year77/week/prs↔170/ps2500 feet ↔ 762 meters1 curie ↔ 3.7 x 10^10 bq
Contexts:
Per monthper pro-rated Strength
Per monthper personnel Strength
Semantic heterogeneity
Per week
Per year
FeetCurie
Meters Bq
11
Accident Rate
Injury Rate
Nuclear Test Safety Exclusion Zone (radius)
Radioactivity
Unit A Army Ground Accident Rate
(per week)
Active Army Military Injury Rate
(per month)
Meter Curie
Unit B Army Ground Accident Rate
(per year)
USAR & ARNG military Injury Rate
(per month)
Meter Becquerel
Unit C Army Ground Accident Rate
(per week)
Active Army Military Injury Rate
(per week)
Kilometer TBq
Unit D Army Ground Accident Rate
(per month)
Army Civilian Employee Injury Rate
(per month)
Feet MBq
So
urc
e C
on
tex
tSource Context Differences
12
Scenario
• A general wants to see a composite reports on all four units.– Direct queries on all four units would results incomparable data. – Without mediation, unit B seems to be doing poorly.
Accident Rate Injury Rate Exposure Radioactivity
Unit A 0.01 0.037 762 1
Unit B 2 0.08 762 37x1010
Unit C 0.05 0.08 0.762 0.037
Unit D 0.028 0.01 2500 37.04
13
Standardization: often not a solution• Works in small systems.• Legitimate reasons for diversity (e.g., different needs)
multiple standards– Unit 1 uses accident rate per year– Unit 2 uses accident rate per month
• Standards are costly to develop– DoD started data standardization in 1991; by 2000,
they only standardized ~1.2% of 1 million data elements*
• Standards do evolve over time– Nuclear tests used the US Customary Measurement
Standard. Now it is moving toward SI standard
* Rosenthal, A., Seligman, L. and Renner, S. (2004) "From Semantic Integration to Semantics Management: Case Studies and a Way Forward", ACM SIGMOD Record, 33(4), 44-50.
14
The Context Interchange Approach
ContextMediator
ReceiverContext
ConversionLibraries
SourceContext
SharedOntologies
Context ManagementAdministrator
Concept: Accident Rate
Per Week Per Year f()Per Week Per Year
Select accidentRateFrom unitA
Source Receiver
ContextTransformation0.01
accidentRate
0.52
Select accidentRate x 52From unitA
12
3
15
Aggregated results in receiver context of Unit C
Accident Rate
(Per week)
Injury Rate
(Per week)
Nuclear Test Safety Exclusion Zone
(Kilometer)
Radioactivity
(TBq)
Mediation No mediation
mediation No mediation
mediation No mediation
mediation No mediation
Unit A 0.1232 0.01 0.009 0.037 0.762 762 0.037 1
Unit B 0.038 2 0.02 0.08 0.762 762 0.037 37x10^10
Unit C 0.05 0.05 0.08 0.08 0.762 0.762 0.037 0.037
Unit D 0.07 0.028 0.1234 0.01 0.762 2500 0.037 37.04
16
Conclusion
Many different contexts are used to evaluate safety measurement within the military.
Needs to have an aggregator to gather and integrate various data.
Automatic context mediation plays a critical role
Context Interchange enables meaningful aggregation
For more information:http://context2.mit.edu/coin
17
Another Example: Regional Comparison Shoppers
US Sweden France UK
18
COIN Conceptual Model
(Ontology)
19
Ontology and Conversion Functioncontext_acurrency: ‘KRW’; scaleFactor:1000kind: base; format: yyyy.mm.dd
context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax; format: dd-mm-yyyy
context_ccurrency: ‘USD’; scaleFactor:1kind:base+tax+SH; format: mm/dd/yyyy
context_d is_a context_b scaleFactor:1e3
context_e is_a context_dFormat: yyyy-mm-dd
context_f is_a context_cKind: base+tax
monetaryValue
price
temporalEntitybasic
kind
currency
is_a relationship
attribute
modifier
Legend
format
scaleFactor
organization
taxRate
Example source: src_turkey(Product, Vendor, QuoteDate, Price)
.*])2([),,,_(
][])2([])1([],1@)2,([
|:
222
ruvrCvalueRDTBCACDRBAolsen
TtempAttrxCCcurrencyxCCcurrencyxvuCCcurrencycvtx
luemonetaryVax
CC
t
C
f
tf
20
Demo – Same Context
No semantic differences
Meaningful data returned
21
(a) Select Vendor, Price From src_turkey Where Product=“Samsung SyncMaster 173P”;
Conversion for scale factor
(b) Select Vendor, QuoteDate, Price From src_turkey Where Product=“Samsung SyncMaster 173P”;
Conversion for date formatConversion for scale factor
Compose only relevant conversions (b e)
22
Introduced because of context difference in auxiliary source
Auto-reconciliation for auxiliary source (b f)
23
Detection and Explication (ba)
24
Date format for receiver
Price definition – remove taxScale factor
Date format for auxiliary source olsenCurrency
Mediated Query (b a)
25
Interoperate: hard-wired approaches(a) BFS approach: Brute-force between pair-wise sources
(b) BFC approach: Brute-force between contexts
1 2
6
5 4
3
1 2
6
5 4
3Internal
standard
(c) Internal standard approach:Adopting a standard
1 2
65 43
context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax; format: dd-mm-yyyy
context_acurrency: ‘KRW’; scaleFactor:1000kind: base; format: yyyy-mm-dd
context_ccurrency: ‘USD’; scaleFactor:1kind:base+tax+SH; format: mm/dd/yyyy
26
Flexibility and Scalability
Approach General case In the example BFS N(N-1), N:= number of sources and
receivers 159,600
BFC n(n-1), n:= number of unique contexts 72,630 ETL/GS 2N, N:= number of sources and receivers 800 COIN 1) Worst case:
m
iii nn
1)1( , ni:= number of
unique values of ith modifier, m := number of modifiers in ontology
2)
m
iin
1)1( , when equational relationships
exist 3) m, if all conversions can be
parameterized
1) worst: 108 2) actual number: 5 (3
general conversions plus 2 for price)
Need to update/add many conversion programs
• Why other approaches cannot fully benefit from general purpose conversion?– the decision whether to invoke the conversion is in the conversion
program
Update the declarative knowledge base.
flexible
Flexible
Not
27
How COIN Scales
• Semantic differences cannot be standardized away• Must be flexible and scalable• Component conversions are defined for each modifier• Overall conversions are automatically composed by
abductive reasoning engine• Composition via symbolic equation solver and a shortest
path algorithm• Inheritance enabled• COIN is a good solution
– Modularization, declarativeness– Automatic composition of necessary conversions
28
The 1805 Overture
In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians promised that their forces would be in the field in Bavaria by Oct. 20.
The Austrian staff planned its campaign based on that date in the Gregorian calendar. Russia, however, still used the ancient Julian calendar, which lagged 10 days behind.
The calendar difference allowed Napoleon to surround Austrian General Mack's army at Ulm and force its surrender on Oct. 21, well before the Russian forces could reach him, ultimately setting the stage for Austerlitz.
Source: David Chandler, The Campaigns of Napoleon, New York: MacMillan 1966, pg. 390.
29
EXTRA SLIDES
30
Yet Another Context Example (Basis for Demo)
Company Name
Company Name
Net Income
Net Income
Sales
Sales
DAIMLER-BENZ AG
346,577
56,268,168
615,000,000
97,737,000,000
O&A DEM-USD Exchange Rate1.00 German Mark= 0.58 US Dollar as 12/31/93
WorldScope
Disclosure
OANDAWeb Server
Context Mediation Services
Users & Appl.
Systems
Net IncomeCompany Name
Sales
DAIMLER-BENZ
614,99597,736,992
Datastream
Wrapper Services
*
*
*
*
*
DAIMLER BENZ CORP
31
Some Context DifferencesContext Definitions
Disclosure Worldscope DataStream Currency Used
Country of Incorporation
USD Country of Incorporation
Currency Conversion
Money Amount As_Of_Date
Money Amount As_Of_Date
Money Amount As_Of_Date
Currency Symbols
3 Letters 3 Letters 2 Letters
Scale Factor 1 1000 1000 Company Names
Disclosure Names Worldscope Names DataStream Names
Date Style American with ‘/’ as separator
American with ‘/’ as separator
European with ‘-’ as separator
Olsen (OANDA) Web Source uses 3 Letter Currency Symbols and European Date Style with ‘/’ as a separator
32
Domain Modelnumber exchange-
Ratestring
currency-Type
from
Cur
toCur
company-Financials
scal
eFac
tor
date
country-Name
curT
ypeSym
company-Name
curr
ency
fyEnding
company
coun
tryI
ncor
p
form
at
date
FmttxnDate
officialCurrency
InheritanceAttributeModifier
Some currency context possibilities:• Currency is stated explicitly as part of record• Currency not stated, but the same for all (e.g., US $)• Currency not stated or constant, but inferred by country
33
HT
TPD
-Daem
on
HT
TPD
-Daem
on
HT
TPD
-Daem
on
Web-site
Wrapper
WWW Gateway
SERVER PROCESSES MEDIATOR PROCESSES CLIENT PROCESSES
COINRepository
ContextMediator
Optimizer
Executioner
Data Store for IntermediateResults
SQL Compiler
DatalogQuery
MediatedQuery
Optimized Query Plan
N
N
HT
TPD
-Daem
on
ODBC-compliant Apps
(e.g Microsoft Excel)
ODBC-Driver
Web Client
(cgi-scripts)
Results
SQL Query
SQL
Query
COIN System Architecture
34
System Demonstration
Q6. Scenario: Using Context Interchange, you can look at the Disclosure data using Datastream Context.
Query: Find out from Disclosure what Net Income for DAIMLER-BENZ was. Use Datastream Context.
Capabilities Demonstrated:
Ability to perform Scale Factor Conversion, Date Format Conversion, Company Name Conversion.
Single Source Queries with MediationSingle Source Queries with Mediation
35
Demonstration @ context2.mit.edu
Context
Source
36
Context Metadata (Partial)
37
Conflict Detection and Mediation
Date convertScale factor convertName convert
Mediated Query in Datalog
38
Mediated SQL Query & Result
Adjust scale factor
Date format conversion
Name conversion
Final results – from Disclosure but in Datastream context
Mediated SQL Query
39
More Complex Example (4 sources: DB + Web)
select WorldcAF.TOTAL_ASSETS, DiscAF.NET_SALES, DiscAF.NET_INCOME, DStreamAF.TOTAL_EXTRAORD_ITEMS_PRE_TAX, quotes.Lastfrom WorldcAF, DiscAF, DStreamAF, quotes where WorldcAF.COMPANY_NAME = "DAIMLER-BENZ AG"and DStreamAF.AS_OF_DATE = "01/05/94" and WorldcAF.COMPANY_NAME = DStreamAF.NAME and WorldcAF.COMPANY_NAME = DiscAF.COMPANY_NAME and WorldcAF.COMPANY_NAME = quotes.Cname;
Databases Web source
40
Conflict Table (1st part)
41
Conflict Table (2nd part)
42
Generated SQL (1st Part)select worldcaf.total_assets, discaf.net_sales, ((discaf.net_income*0.001)*olsen.rate), (dstreamaf2.total_extraord_items_pre_tax*olsen2.rate), quotes.Lastfrom (select date1, 'European Style -', '01/05/94', 'American Style /' from datexform where format1='European Style -' and date2='01/05/94' and format2='American Style /') datexform, (select dt_names, 'DAIMLER-BENZ AG' from name_map_dt_ws where ws_names='DAIMLER-BENZ AG') name_map_dt_ws, (select ds_names, 'DAIMLER-BENZ AG' from name_map_ds_ws where ws_names='DAIMLER-BENZ AG') name_map_ds_ws, (select 'DAIMLER-BENZ AG', ticker, exc from ticker_lookup2 where comp_name='DAIMLER-BENZ AG') ticker_lookup2, (select 'DAIMLER-BENZ AG', latest_annual_financial_date, current_outstanding_shares, net_income, sales, total_assets, country_of_incorp from worldcaf where company_name='DAIMLER-BENZ AG') worldcaf, (select country, currency from currencytypes where currency <> 'USD') currencytypes, (select exchanged, 'USD', rate, date from olsen where expressed='USD') olsen, (select company_name, latest_annual_data, current_shares_outstanding, net_income, net_sales, total_assets, location_of_incorp from discaf) discaf,
43
Generated SQL (Continued - Partial) (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf, (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf2, (select char3_currency, char2_currency from currency_map where char3_currency <> 'USD') currency_map, (select country, currency from currencytypes where currency <> 'USD') currencytypes2, (select exchanged, 'USD', rate, '01/05/94' from olsen where expressed='USD' and date='01/05/94') olsen2, (select Cname, Last from quotes) quoteswhere currencytypes.country = discaf.location_of_incorpand currencytypes.currency = olsen.exchangedand dstreamaf.currency = dstreamaf2.currencyand dstreamaf2.currency = currency_map.char2_currencyand olsen.date = discaf.latest_annual_dataand currency_map.char3_currency = currencytypes2.currencyand currencytypes2.currency = olsen2.exchangedand name_map_dt_ws.dt_names = dstreamaf2.nameand name_map_ds_ws.ds_names = discaf.company_nameand ticker_lookup2.ticker = quotes.Cnameand datexform.date1 = dstreamaf2.as_of_dateand currencytypes.currency <> 'USD'and currency_map.char3_currency <> 'USD'unionselect worldcaf2.total_assets, discaf2.net_sales, ((discaf2.net_income*0.001)*olsen3.rate), dstreamaf4.total_extraord_items_pre_tax, quotes2.Last
from (select date1, 'European Style -', '01/05/94', 'American Style /' from datexform where format1='European Style -' and date2='01/05/94' and format2='American Style /') datexform2, (select dt_names, 'DAIMLER-BENZ AG' from name_map_dt_ws where ws_names='DAIMLER-BENZ AG') name_map_dt_ws2, (select ds_names, 'DAIMLER-BENZ AG' from name_map_ds_ws where ws_names='DAIMLER-BENZ AG') name_map_ds_ws2, (select 'DAIMLER-BENZ AG', ticker, exc from ticker_lookup2 where comp_name='DAIMLER-BENZ AG') ticker_lookup22, (select 'DAIMLER-BENZ AG', latest_annual_financial_date, current_outstanding_shares, net_income, sales, total_assets, country_of_incorp from worldcaf where company_name='DAIMLER-BENZ AG') worldcaf2, (select country, currency from currencytypes where currency <> 'USD') currencytypes3, (select exchanged, 'USD', rate, date from olsen where expressed='USD') olsen3, (select company_name, latest_annual_data, current_shares_outstanding, net_income, net_sales, total_assets, location_of_incorp from discaf) discaf2, (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf3, (select 'USD', char2_currency from currency_map where char3_currency='USD') currency_map2,
etc
44
Final Result
45
Execution Trace (1st Part - Partials)
. . .
Parallel Execution
Retrieving dataFrom Web source
46
Execution Trace (Continued - Partials). . .
. . .
Another Web source used(for currency conversion)
Stock price returnedFrom Web source
47
Appendix: Sample Applications
• Airfare, Car Rental and Merged Travel • Weather • Global Price Comparison • Airfare Aggregation • Disaster Relief • TASC Financial Example • Web Services Demo • Corporate Householding
48
Web page spec file *
Appendix: COIN Web-Wrapper Technology
Select Edgar.Net_incomeFrom EdgarWhere Edgar.Ticker=intcand Edgar.Form=10-Q
Ticker Net IncomeINTC 1,983
User or Program (via SQL Query)
Web Wrapper Generat
or
Data record returned
* Spec file contains:Schema, Navigation rules,and Extraction rules.
SQLSide
HTMLSide