mirror outlier detection in foreign trade data markos fragkakis ntts 2009
TRANSCRIPT
Mirror Outlier Detection in Mirror Outlier Detection in Foreign Trade DataForeign Trade DataMarkos Fragkakis
NTTS 2009
IntroductionIntroductionForeign Trade dataImprovement of FT quality is essentialQuality can be assessed using several
dimensions (e.g. accuracy, timeliness, clarity)
We focus on accuracy using outlier detection
Methods for outlier outlier detection (e.g. threshold, model based)
Presentation of the Mirror Outlier Detection application
2
MethodologyMethodologyUnivariate detection in time
series (value, quantity, supplementary quantity)
Median Absolute Deviation
Robust◦median, not mean◦non-parametric
€
Ti =xi −M1
M2
=xi −M1
Median(| x j −M1 |)> c
3
Mirror Outlier DetectionMirror Outlier DetectionCharacterization of outliers
according mirror flow.Possible outlier types:
◦Green: outlier appears in mirror (same sign)
◦Red: outlier does not appear in mirror◦Violet: outlier appears in mirror
(opposite sign)◦Black: mirror series not present◦Pink: mirror series not present
(confidentiality)
4
Additional functionalitiesAdditional functionalitiesOutlier classification (error in
dimension, not observed values)◦Swapping of observation between
series◦Copy of observations◦Time delay (hidden green outlier)
Outlier detection in short series (product code changes)
Reporting for◦Detected outliers per country (e-mailed)◦Summary reporting
5
Example of detected Example of detected outlieroutlier
6
Example of error due to Example of error due to swapswap
7
Error due to time delayError due to time delay
8
Technical InformationTechnical InformationMOD-DB has RDBMS repository for
storing outlier data (support for Oracle, MySQL).
Implemented in Java (portability, maintainability)
Command Line InterfacePerformance issues
◦Large volume of data cause bottleneck in DB
◦Storage is in question (several GBs per month)
9
ArchitectureArchitecture
10
Proposal for new platformProposal for new platformUse a multi dimensional viewerEnable OLAP functions (slice, dice,
rollup drilldown) Create dynamic charts from dataEstimated variables (indices from
raw outlier data)Data mining could be performed for
extracting inferences from data◦Log linear models
Pin-point of poor data involving high values
11
ConclusionsConclusionsUse of mirror flow for outlier
chacterisationNew featuresImproving qualityEnable building new platform for
data explorationExpansions of MOD to other FT
data outside EU, other domain.
12
QuestionsQuestions
Thank you for your attention
13