data mining in ubiquitous distributed environments
DESCRIPTION
Data Mining in Ubiquitous Distributed Environments. Assaf Schuster Technion. Purpose of this Tutorial. Convergence of distributed systems and data mining Evolving field, no systematic coverage of all aspects - PowerPoint PPT PresentationTRANSCRIPT
SEBD Tutorial, June 06SEBD Tutorial, June 06
Data Mining in Data Mining in Ubiquitous Distributed Ubiquitous Distributed
EnvironmentsEnvironmentsAssaf Schuster Assaf Schuster
TechnionTechnion
SEBD Tutorial, June 06SEBD Tutorial, June 06
Purpose of this TutorialPurpose of this Tutorial
• Convergence of distributed systems and data mining
• Evolving field, no systematic coverage of all aspects
• Will present: issues, challenges, examples for algorithmic approaches, ideas, tradeoffs accuracy vs. overhead
• Will not present: formal treatment, proofs, details, technology, systems, hardware…
SEBD Tutorial, June 06SEBD Tutorial, June 06
Ubiquitous Computing SystemsUbiquitous Computing Systems• Various Systems: Grid, P2P, WSN, MANET• Several similar technological aspects
– Scale, aim for at least 10K (10M in P2P)• partial failure, heterogeneity, dynamic state / data
– Multi-user, a 10K system serves >= 1K users• resource sharing, caching, consistency
– Lots of distributed data• streams, incremental, anytime, local filtering, locality filtering
– Cooperation of self-motivated parties• trust management, security, privacy, competitive market, self
vs. global optimizations
– Stringently resource limited• in-network computing, storage distribution
• Non-similar technological aspects
SEBD Tutorial, June 06SEBD Tutorial, June 06
Ubiquitous Data MiningUbiquitous Data Mining• For the community
– E.g., P2P recommendations based on e-interaction
• For Security– E.g., identify and avert DoS attack (Overpeer and
P2P poisoning)
• For Administration– E.g., misconfiguration detection system
(DataMiningGrid demo)
• For Data Cleansing– E.g., in-network outliers detection (and removal) in
WSN
• DM Using HPC– E.g., idle-cycle batch systems for high-complexity
analysis tasks (Superlink-Online)
SEBD Tutorial, June 06SEBD Tutorial, June 06
Technological Challenges: AlgorithmsTechnological Challenges: Algorithms• Scalable and resource limited distributed DM
– Algorithms for 10K peers, algorithms limited to two messages per peer per hour, synchronization-less, iteration-less, bag-of-tasks, dynamic divisibility, etc.
• Monitoring– Distributed, local filtering
• Success, Correctness, and Consistency– Partial failure, message dropping, heterogeneity, etc.
can yield all sorts of trouble
• Reusability, incrementality– E.g., multi-class classifiers, multi-metric k-means
clustering, etc.
SEBD Tutorial, June 06SEBD Tutorial, June 06
Technological Challenges: SystemsTechnological Challenges: Systems• Exploitation & HCI
– Lay user (parameterless) DM, interactive DM– DM-based autonomous ubiquitous systems
• Security, Fraud, and Privacy– Authorization, public-key-infrastructure, trust
management, data polution
• Longevity of DM jobs– Resource sharing, non dedicated resources
• Communication patterns– Esp. reliability and addressability. Are these
problems best solved by suitable algorithms?