ioannis konstantinou school of ece computing systems laboratory

19
GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou School of ECE Computing Systems Laboratory National Technical University of Athens

Upload: newman

Post on 19-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

GridNEWS : A distributed Grid platform for efficient storage, annotating, indexing and searching of large  audiovisual news content. Ioannis Konstantinou School of ECE Computing Systems Laboratory National Technical University of Athens. Concept. Text based search in audiovisual content - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ioannis Konstantinou School of ECE Computing Systems Laboratory

GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large  audiovisual news content

Ioannis KonstantinouSchool of ECE

Computing Systems LaboratoryNational Technical University of

Athens

Page 2: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Concept

Text based search in audiovisual content Search results: Portions of video files

containing selected keywords Example

User searches for keyword “Acropolis” Video portions containing the spoken word

“Acropolis” are located and presented in the user

YouTube like functionality

Page 3: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Objectives

Keyword extraction from video files using automatic speech recognition algorithms (ASR)

Efficient and scalable distributed storage of large media content

Indexing of extracted metadata for efficient keyword search

YouTube like user interface for video searching/downloading

Contribution to existing Grid Middleware using GGF standardized components

Page 4: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Addressed Issues Execution

Distributed execution of CPU/Data intensive Speech Recognition Algorithms

Storage Server load balancing using

performance metrics Client transfer time optimization using

bittorrent like algorithms Increase data availability Multi-organizational data storage

support using Virtual Organizations (VOs)

Page 5: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Video file keyword extraction procedure

MetadataDB

MetaScheduler

Cluster Cluster

Cluster

Storage

SESE

SE

Application Software

Data Client GET/PUT File Ops

CVSP tool

MetadataFile

Export

MetadataParserImport

Update

Schedule Execution

Page 6: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Distributed Execution Platform Architecture

Central MDS Server

User Interface

GridWay Metascheduler

Globus Toolkit 4

MDS

Local MDS Server Local MDS Server Local MDS Server

Publish aggregateCluster information

SOAP/XML messages

Computing Element

PBS Scheduler

Globus Toolkit 4

MDS + WSGRAM

WN WN WN

Ganglia Stats

WN WN WN WN WN WN

PBS client service

Ganglia Tool

MDS : Managing and Discovery ServicePBS : Portable Batch SchedulerWSGRAM : Grid Resource and

Allocation ManagementWN: Worker Node

Page 7: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Distributed Storage Platform Architecture

DRLS Pastry Peer to Peer

SE

GridFTP Server

Network Weather Service (NWS)

DRLSCom Service

Globus WS Apache Axis Container

SE

SE

Storage Subsystem

Client Machine

GridNews DataClient component

GridFTP Client

Parallel Downloader

GridFTPGet/put

Query Storage Subsystemto obtain candidate storage servers

using SOAP/XML Messages

Get/Put LFN to PFN mappings

SOAP/XML Messages

MDS: Managing and Discovery ServiceGridFTP: Grid File Transfer ProtocolDRLS: Distributed Replica Location ServiceSE: Storage ElementPFN: Physical File NameLFN: Logical File Name

SE

Replication Algorithmusing NWS stats

Page 8: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Distributed Replica Location Service

Contains mappings of LFNs to PFNs DHT used: Pastry P2P Logarithmic routing

In a network with n nodes, a query needs only log(n) messages (hops)

Plaxton’s algorithm minimizes query latencies Redundancy through replication

Eliminates single point of failure situations Inherent load balancing capabilities

Consistent hashing algorithms

Page 9: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Load Balancing

Servers exchange load metrics CPU Bandwidth Free Disk Space

Prediction algorithms (e.g. Linear regression) forecast future metrics from history data

Weighted Normalized Metric WNM : WmX(Mt/Mmax) Total Server Load (TSL): Sum(WNMi)i=1..n Servers maintain numerically sorted TSL list:

[TSL1..TSLn] TSL list periodically refreshed

Page 10: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Replication

Upon a STORE client request: Top k servers are selected from WNM list

k: configurable static replication factor Most suitable server is returned to the client Client initiates a single GridFTP file upload Server replicates the new file according to

WNM list and factor k DRLS is informed about the new LFN->PFN

mappings Client is informed Upon completion

Page 11: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Parallel Downloader

Upon a GET client request: Server contacts DRLS and retrieves replica

locations Client establishes N GridFTP connections Client initiates N parallel (threaded) small data

chunk requests After each successful retrieval, client re-initiates

another request Optimum file transfer time:

The greater file portion is retrieved from the faster storage nodes

To be replaced by GridTorrent

Page 12: Ioannis Konstantinou School of ECE Computing Systems Laboratory

GridTorrent Metadata fields

Current Id File size Piece size Hashes

Distributed RLS instead of Tracker

Partial GridFTP for actual transfer

BitTorrent replica selection and tit-for-tat algorithm.

Compatible with plain GridFTP servers

PFN’s prefix determines protocol (gtp://site.fully.qualified.domain.name/path/to/file)

Bittorrent

Peer3GridFTP Server

New connection

New

connection

Peer setPeer2

GridTorrent client

Peer1GridTorrent client Bittorrent

Stri

ped

Grid

FTP

Striped GridFTP

CurrentID :pfn1 pfn2 pfn3

publish

pfns

Peer that wants todownload a file

size,piece_size,hashes, currentID

DistributedRLS

Striped GridFTP

Bitt

orre

nt

New

con

nect

ion

piece

piece

piece

piece

piece

piece

Page 13: Ioannis Konstantinou School of ECE Computing Systems Laboratory

EXISTING MIDDLEWARE CONTRIBUTION

Design and development of a dynamic replica

selection/placement algorithm

Added support for multiple clusters using Gridway Metascheduler

Replace centralized replica location service with a

scalable distributed peer to peer solution

Enhanced bittorrent like file downloading

from multiple sources

Page 14: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Development Testbed Hardware

4X Dual Core AMD Opteron(tm) Processor 875 2.2GHz – 8 virtual CPU 16Gb Ram

Deployment of 5 Xen virtual machines, 2GB ram Software

Globus Toolkit v4.0 Globus WS Core (Apache AXIS WS Container) Rice Pastry P2P (Java) Network Weather Service Torque (OpenPBS) scheduler GridWay Metascheduler

Page 15: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Virtualization

Xen Hypervisor Paravirtualization tequnique Guest OS use special “xen aware” kernel Direct utilization of special CPU instructions Faster than full virtualization (VmWare)

Use of Xen Hypervisor Easy prototype management/administration Simple control of the node lifecycle Facilitate prototype deployment in many actual

nodes

Page 16: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Currently working

Replace ParallelDownloader with GridTorrent

Deploy prototype in the PlanetLab testbed

Run experiments Fine-tune designed algorithms

Page 17: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Gridnews Portal

Users can: Perform keyword search in the auto-

annotated multimedia content View the video from their browser in a

youTube style Download only a fragment of the video

where this keyword exists

Page 18: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Screenshots

keyword

Video URLs

time position

Page 19: Ioannis Konstantinou School of ECE Computing Systems Laboratory

Questions