data mining in ubiquitous distributed environments

6
SEBD Tutorial, June 06 SEBD Tutorial, June 06 Data Mining in Data Mining in Ubiquitous Ubiquitous Distributed Distributed Environments Environments Assaf Schuster Assaf Schuster Technion Technion

Upload: eara

Post on 07-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Data Mining in Ubiquitous Distributed Environments. Assaf Schuster Technion. Purpose of this Tutorial. Convergence of distributed systems and data mining Evolving field, no systematic coverage of all aspects - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Mining in Ubiquitous Distributed Environments

SEBD Tutorial, June 06SEBD Tutorial, June 06

Data Mining in Data Mining in Ubiquitous Distributed Ubiquitous Distributed

EnvironmentsEnvironmentsAssaf Schuster Assaf Schuster

TechnionTechnion

Page 2: Data Mining in Ubiquitous Distributed Environments

SEBD Tutorial, June 06SEBD Tutorial, June 06

Purpose of this TutorialPurpose of this Tutorial

• Convergence of distributed systems and data mining

• Evolving field, no systematic coverage of all aspects

• Will present: issues, challenges, examples for algorithmic approaches, ideas, tradeoffs accuracy vs. overhead

• Will not present: formal treatment, proofs, details, technology, systems, hardware…

Page 3: Data Mining in Ubiquitous Distributed Environments

SEBD Tutorial, June 06SEBD Tutorial, June 06

Ubiquitous Computing SystemsUbiquitous Computing Systems• Various Systems: Grid, P2P, WSN, MANET• Several similar technological aspects

– Scale, aim for at least 10K (10M in P2P)• partial failure, heterogeneity, dynamic state / data

– Multi-user, a 10K system serves >= 1K users• resource sharing, caching, consistency

– Lots of distributed data• streams, incremental, anytime, local filtering, locality filtering

– Cooperation of self-motivated parties• trust management, security, privacy, competitive market, self

vs. global optimizations

– Stringently resource limited• in-network computing, storage distribution

• Non-similar technological aspects

Page 4: Data Mining in Ubiquitous Distributed Environments

SEBD Tutorial, June 06SEBD Tutorial, June 06

Ubiquitous Data MiningUbiquitous Data Mining• For the community

– E.g., P2P recommendations based on e-interaction

• For Security– E.g., identify and avert DoS attack (Overpeer and

P2P poisoning)

• For Administration– E.g., misconfiguration detection system

(DataMiningGrid demo)

• For Data Cleansing– E.g., in-network outliers detection (and removal) in

WSN

• DM Using HPC– E.g., idle-cycle batch systems for high-complexity

analysis tasks (Superlink-Online)

Page 5: Data Mining in Ubiquitous Distributed Environments

SEBD Tutorial, June 06SEBD Tutorial, June 06

Technological Challenges: AlgorithmsTechnological Challenges: Algorithms• Scalable and resource limited distributed DM

– Algorithms for 10K peers, algorithms limited to two messages per peer per hour, synchronization-less, iteration-less, bag-of-tasks, dynamic divisibility, etc.

• Monitoring– Distributed, local filtering

• Success, Correctness, and Consistency– Partial failure, message dropping, heterogeneity, etc.

can yield all sorts of trouble

• Reusability, incrementality– E.g., multi-class classifiers, multi-metric k-means

clustering, etc.

Page 6: Data Mining in Ubiquitous Distributed Environments

SEBD Tutorial, June 06SEBD Tutorial, June 06

Technological Challenges: SystemsTechnological Challenges: Systems• Exploitation & HCI

– Lay user (parameterless) DM, interactive DM– DM-based autonomous ubiquitous systems

• Security, Fraud, and Privacy– Authorization, public-key-infrastructure, trust

management, data polution

• Longevity of DM jobs– Resource sharing, non dedicated resources

• Communication patterns– Esp. reliability and addressability. Are these

problems best solved by suitable algorithms?