foundaons)of)large/scale) mul’mediainformaon) managementand)retrieval) lecture)#1...
TRANSCRIPT
Founda'ons of Large-‐Scale Mul'media Informa'on
Management and Retrieval
Lecture #1 Introduc'on
Edward Y. Chang
Founda'ons of LSMM 1 Edward Chang
Foundations of LSMM 2 Edward Chang
Datasets
• YouTube • Facebook Photos • NeIlix Movies • Apple Music • Archive.org
– hMp://www.archive.org/details/audio – 400k recordings
Founda'ons of LSMM 3 Edward Chang
Technical Challenges
• Volume, both too small and too large • Variety, text, video, image, music, social, etc. • Velocity
Founda'ons of LSMM 4 Edward Chang
Founda'ons of LSMM 5 Edward Chang
Founda'ons of LSMM 6 Edward Chang
Key Applica'ons Use Image as an Example
• Content-‐based Retrieval – Given an image, find perceptually similar ones
• Annota'on – Given an image, iden'fy what (objects, events), where (loca'on), when ('me), and who (people)
• Do the above both quickly and accurately
• Applicable to any mul'media data types
Founda'ons of LSMM 7 Edward Chang
Key Subrou'nes
• Feature Extrac'on • Machine Learning
– Small sample pool – Large sample pool
• Similarity • Mul'modal Fusion • Indexing • Scalability
– Parallel algorithms – Online algorithms
Founda'ons of LSMM 8 Edward Chang
Feature Extrac'on
• Representa've Methods – Color, texture, shape – SIFT – Bio-‐mo'vated HMAX – Deep Learning
• Model-‐Based vs. Data-‐Driven
Founda'ons of LSMM 9 Edward Chang
Classifica'on • Query-‐Concept Learning
– Online – Concept-‐dependent learning – Imbalanced data – Large D – D >> N – N-‐ >> N+
• Annota'on – Offline – Feature-‐to-‐seman'cs mapping – Large N
Founda'ons of LSMM 10 Edward Chang
Similarity
• Tradi'onal Distance Func'on • Dynamic Par'al Func'on • Learning Similarity from Data
Founda'ons of LSMM 11 Edward Chang
Mul'modal Fusion
• Feature Fusion (Cross-‐Modality) – Weighted sum – PCA – Dimensionality curse
• Personaliza'on (Cross-‐Domain) – Latent behavior
• Context + Content – Loca'on, 'me, et al.
Founda'ons of LSMM 12 Edward Chang
Storage and Indexing
• Dimensionality Curse • High-‐Dimensional Indexing
Founda'ons of LSMM 13 Edward Chang
Large-‐Scale Learning 3V Problem
• Huge Volume of Data • High Velocity • High Variety
Founda'ons of LSMM 14 Edward Chang
Key Parallel Algorithmic Work • Parallel Spectral Clustering, W.-‐Y. Chen, Y. Song, H. Bai, Chih-‐Jen
Lin, and E. Y. Chang, IEEE Transac'ons on PaMern Analysis and Machine Intelligence (PAMI), 2010.
• PLDA+: Parallel Latent Dirichlet Alloca'on with Data Placement and Pipeline Processing, ACM Transac'ons on Intelligent Systems and Technology, 2011.
• Parallel Support Vector Machines, E. Y. Chang, et al., NIPS 2007. • PFP: Parallel FP-‐Growth for Query Recommenda'on,
H. Li, Y. Wang, D. Zhang, M. Zhang, and E. Y. Chang, ACM Recommenda'on Systems, Lausanne, October 2008
Foundations of LSMM 15 Edward Chang
Founda'ons of LSMM 16 Edward Chang
Founda'ons of LSMM 17 Edward Chang
Founda'ons of LSMM 18 Edward Chang
Founda'ons of LSMM 19 Edward Chang
Compara've Study on Small Dataset
Large Dataset 10,000 instances per class
Compara've Study on Large Dataset
Data Center (Cloud)
Founda'ons of LSMM 23 Edward Chang
Sample Hierarchy • Server
– 16GB DRAM; 160 GB SSD; 5 x 1TB disk
• Rack – 40 servers – 48 port Gigabit Ethernet switch
• Warehouse – 10,000 servers (250 racks) – 2K port Gigabit Ethernet switch
Founda'ons of LSMM 24 Edward Chang
Syllabus
• Feature Extrac'on (Lecture #2) • Machine Learning (Lecture #3)
– Small sample pool – Large sample pool
• Similarity (Lecture #4) • Mul'modal Fusion (Lecture #5) • Indexing (Lecture #6) • Scalability (Lectures #7 & #8)
– Parallel algorithms – Online algorithms
Founda'ons of LSMM 25 Edward Chang
Reading
• Founda'ons of Large-‐Scale Mul'media Informa'on Management and Retrieval, E. Y. Chang, Springer, 2011 – Chapter #1
Edward Chang Founda'ons of LSMM 26