hands on: multimedia methods for large scale video analysisfractor/fall2012/cs294-1-2012.pdf · day...

43
Hands On: Multimedia Methods for Large Scale Video Analysis Dr. Gerald Friedland, [email protected] 1 Sunday, August 26, 12

Upload: others

Post on 12-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Hands On: Multimedia Methods for Large Scale Video Analysis

Dr. Gerald Friedland, [email protected]

1

Sunday, August 26, 12

Page 2: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Multimedia in the Internet is Growing

Raphaël Troncy: Linked Media: Weaving non-textual content into the Semantic Web, MozCamp, 03/2009.

2

Sunday, August 26, 12

Page 3: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Multimedia in the Internet is Growing

• YouTube claims 65k 100k video uploads per day or 48 72 hours every minute

• Flickr claims 1M images uploads per day

• Twitter: up to 120M messages per day 3

Sunday, August 26, 12

Page 4: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Why do we care?

• Consumer-Produced Multimedia allows empirical studies at never-before seen scale in various research disciplines such as sociology, medicine, economics, environmental sciences, computer science...

• Recent buzzword: BIGDATA4

Sunday, August 26, 12

Page 5: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Problem

How can YOU effectively work on large scale multimedia data (without working at Google)?

5

Sunday, August 26, 12

Page 6: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

What is this class about?

• Introduction to large-scale video processing from a CS point of view (application driven)

• Covers different modalities: Visual, Audio, Tags, Sensor Data

• Covers different side topics, such as Mechanical Turk for annotation

Sunday, August 26, 12

Page 7: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Content of this Class

• Visual methods for video analysis• Acoustic methods for video analysis• Meta-data and tag-based methods for video

analysis• Inferring from the social graph and collaborative

filtering• Information fusion and multimodal integration• Coping with memory and computational issues• Crowd sourcing for ground truth annotation• Privacy issues and societal impact of video

retrieval

Sunday, August 26, 12

Page 8: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Why you should you be interested in the Topic

• Processing of large scale, consumer produced videos is a brand-new emerging field driven by:– Massive government funding (e.g. IARPA

Aladdin, IARPA Finder, NSF BIGDATA)– Industry’s needs to make consumer videos

retrievable and searchable– Interesting academic questions

Sunday, August 26, 12

Page 9: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Some Examples of Research at ICSI

•Video Concept Detection•Forensic User Matching•Location Estimation•Fast Approaches in Parlab

Sunday, August 26, 12

Page 10: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Video2GPS: Multimodal Location Estimation of Flickr Videos

Gerald FriedlandInternational Computer Science InstituteBerkeley, [email protected]

Sunday, August 26, 12

Page 11: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Definition

11

• Location Estimation = estimating the geo-coordinates of the origin of the content recorded in digital media.

• Here: Regression task– Where was the media recorded in lattitude,

longitude, and altitude?

G. Friedland, O. Vinyals, and T. Darrell: "Multimodal Location Estimation", pp. 1245-1251, ACM Multimedia, Florence, Italy, October 2010.

Sunday, August 26, 12

Page 12: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Motivation 1

12

Training data comes for free on a never-before seen scale!

Portal % TotalYouTube (estimate) 3.0 3MFlickr 4.5 180M

Allows tackling of tasks of never-before seens difficulty?

Sunday, August 26, 12

Page 13: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Motivation 2

13

Location-based services are awesome!

Geocoordinates+Time = Unique ID Sunday, August 26, 12

Page 14: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

De-Motivation

14

Geo-Location enables Cybercasing!

 

G. Friedland and R. Sommer: "Cybercasing the Joint: On the Privacy Implications of Geotagging", Proceedings of the Fifth USENIX Workshop on Hot Topics in Security (HotSec 10), Washington, D.C, August 2010.

Sunday, August 26, 12

Page 15: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Placing TaskAutomatically guess the location of a Flickr video:• i.e., assign geo-coordinates (latitude and

longitude)• Using one or more of:

– Visual/Audio content– Metadata (title, tags, description, etc)– Social information 15

Sunday, August 26, 12

Page 16: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Consumer-Produced, Unfiltered Videos...

Sunday, August 26, 12

Page 17: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Data Description (2011)• Training Data

– 10k flickr videos/metadata/visual keyframes (+features)/geo-tags

– 6M flickr photos/metadata/visual features

• Test Data– 5k flickr video/metadata/visual

keyframes (+features)– no geotags

• Test/Training split: by UserID 17

Sunday, August 26, 12

Page 18: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Your Best Guess???

Sunday, August 26, 12

Page 19: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Our Approach

19

Tag-based Approach+ Visual Approach+ Acoustic Approach-------------------Multimodal Approach

Sunday, August 26, 12

Page 20: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Intuition for the Approach

Investigate `Spatial Variance’ of a feature: – Spatial variance is small: Features is

likely location-indicative– Spatial variance is high: Feature is

likely not indicative.

20

Sunday, August 26, 12

Page 21: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Example Revisited

Tag: ['pavement', 'ucberkeley', 'berkeley', 'greek', 'greektheatre', 'spitonastranger', 'live', 'video‘]

Sunday, August 26, 12

Page 22: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Tag-based Approach

• ‘greektheatre’ would be the correct answer: however, it was not seen in the development dataset

• ‘ucberkeley’ is the 2nd-best answer: estimation error : 0.332 km

Tag Matches in Training set

Spatial Variance

pavement 2 5.739

ucberkeley 4 0.132

berkeley 14 68.138

greek 0 N/A

greektheatre 0 N/A

spitonastranger 0 N/A

live 91 6453.109

video 2967 6735.844

Sunday, August 26, 12

Page 23: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Visual Approach

• Assumption: similar images similar locations• Based off related work: Reduce location estimation to an image retrieval problem•Use median frame of video as keyframe

Sunday, August 26, 12

Page 24: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Visual Features

• Color Histograms – L*a*b* (4, 14, 14 bins, respectively)– Chi-square distance

• GIST – 5x5 spatial resolution, 6 orientations and 4 scales– Euclidean distance

See: A. Oliva, and A. Torralba: Building the Gist of a Scene: The Role of Global Image Features in Recognition, Progress in Brain Research Volume: 155, Issue: 1, Publisher: Elsevier, Pages: 23-36, 2006

Sunday, August 26, 12

Page 25: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

25

Sunday, August 26, 12

Page 26: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

26

Sunday, August 26, 12

Page 27: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Audio Approach

• Idea: Different places have different acoustic “signatures”

• Due to data sparsity: Initially only used for major cities

• Classification system similar to GMM/UBM Speaker ID system using MFCC19 features.

Sunday, August 26, 12

Page 28: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

• Which city was this recorded in? Pick one of: Amsterdam, Bangkok, Barcelona, Beijing, Berlin, Cairo, CapeTown, Chicago, Dallas, Denver, Duesseldorf, Fukuoka, Houston, London, Los Angeles, Lower Hutt, Melbourne, Moscow, New Delhi, New York, Orlando, Paris, Phoenix, Prague, Puerto Rico, Rio de Janeiro, Rome, San Francisco, Seattle, Seoul, Siem Reap, Sydney, Taipei, Tel Aviv, Tokyo, Washington DC, Zuerich

• Solution: Tokyo, highest confidence score!•

An ExperimentListen!

Sunday, August 26, 12

Page 29: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Audio-Only Results

1 2 5 10 20 40 60 80 90 95 98 99 1 2

5

10

20

40

60

80

90

95

98 99

DET curve for common−user city identification with 540 videos and 9,700 trials

False Alarm probability (in %)

Mis

s pr

obab

ility

(in %

)

EER = 22%

Sunday, August 26, 12

Page 30: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Problem: Bias

Sunday, August 26, 12

Page 31: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Geo-­‐tagging:  an  es-ma-on-­‐theore-c  

{berkeley,  sathergate,  campanile}

{berkeley,  haas} {campanile} {campanile,  haas}

Observa(ons:

Images:

Tags: , , ,

{tk1} {tk2} {tk3} {tk4}, , ,Es(mate:

Geoloca-ons:

x1 x2 x3 x4, , ,Sunday, August 26, 12

Page 32: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Interpre-ng  tradi-onal  approaches

Loca-ons  are  random  variables: {x1, x2, ....., xN}

Tradi-onal  approaches  es-mate:p(xi|{tki }) �

Y

k

p(xi|tki )

wherep(xi|tki ) is  obtained  from  the  training  set

Example:  the  distribu-on  for  the  tag  “washington”  is  depicted  here

Loca-on  es-mate:Z

xi p(xi|{tki })dxi

Probability  of  loca-on  given  tags

Sunday, August 26, 12

Page 33: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

DrawbacksData  sparsity:    Not  all  tags  in  test  set  are  available  in  training  set.                  Hence  es-mate  of                                          can  be  bad    

p(xi|tki )Sub-­‐op(mality:    The  approaches  are  subop-mal  given  the  data.

What  we  ideally  want:p(x1, x2, ....., xN |{tk1}, {tk2}, ..., {tkN})

Mean  of  the  above  distribu-on  gives  the  best  es-mate  of  the  loca-onsi.e.  for  each  image  we  want

p(xi|{tk1}, {tk2}, ...., {tkN})

Tradi-onal  algorithms  only  give:p(xi|{tki })

Sunday, August 26, 12

Page 34: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Coopera-ve  geo-­‐taggingIntui-on:  Images  in  the  training  set  having  common  tags  have              correlated  geo-­‐loca-ons  captured  by  the  joint  distribu-onJoint  probability  modeling:

p(x1, x2, ....., xN |{tk1}, {tk2}, ..., {tkN}) �Y

i

p(xi|{tki })Y

(i,j)

p(xi, xj |{tki } ⇥ {tkj })

Pairwise  distribu-on  given  at  least  one  common  tag

is  obtained  from  the  training  set  as  before

p(xi, xj |{tki } � {tkj }) Modeled  as  an  indicator  func-on I(xi = xj)If  the  common  tag  has  low  spa-al  variance  or  occurs  infrequently,  e.g.  if  the  common  tag  is  “haas”,  its  very  likely  the  loca-ons  are  the  same

Ques-on: How  to  es-mate  to  op-mal  marginal  distribu-on  ?p(xi|{tk1}, {tk2}, ...., {tkN})

p(xi|{tki })

Sunday, August 26, 12

Page 35: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Bayesian  graphical  framework{berkeley,  sathergate,  campanile}

{berkeley,  haas}

{campanile} {campanile,  haas}

Node:  Geoloca-on  of  the  image

Edge:  Correlated  loca-ons  (e.g.  common  tag)

Edge  Poten(al:  Strength  of  an  edge,  (e.g.  posterior  distribu-on  of  loca-ons  given  common  tags)

p(xi, xj |{tki } � {tkj })

p(xj |{tkj })p(xi|{tki })

Sunday, August 26, 12

Page 36: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Belief  propaga-on  updatesp(xi|{tk1}, {tk2}, ...., {tkN})Itera-ve  algorithm  to  approximate  

the  posterior  distribu-on

Gaussian  modeling p(xi|{tki }) � N (µi,�2i )

At  itera-on  0  each  node  calculates (µi,�2i )

At  itera-on  t  each  node  updates  its  loca-on  as  a  weighted  mean  of  its  previous  loca-on  and  that  of  its  neighbors

µ(t)i =

1

(�(t)i )2

µ(t�1)i +

Pk⇥N (i)

1

(�(t)k )2

µ(t)k

(�(t)i )2

1

(�(t)i )2

=1

(�(t�1)i )2

+X

k2i

1

(�(t�1)k )2

The  weights  reflect  the  confidence  in  that  measurements,  i.e.  higher  the  spa-al  variance  lower  is  the  weight

Sunday, August 26, 12

Page 37: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Belief  propaga-on

(µ1,�21)

(µ2,�22)

(µ3,�23)

Posterior  mean  and  variance  assuming  Gaussian  beliefs

Audio  visual  features  are  incorporated  in  modeling  the  edge  and  node  poten-als

Sunday, August 26, 12

Page 38: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Incorpora-ng  Audio-­‐Visual  features• GIST  features  are  extracted  for  the  images.• MFCC  features  are  extracted  for  the  audio.• These  are  now  incorporated  into  the  node  and  edge  poten-als  as  

exponen-al  distribu-ons.

p(xi, xj |ai, aj) ⇥ exp(� ||xi � xj ||�||ai � aj ||

)

ai are  the  audio  features  associated  with  image  i

The  intui-on  is  that  closer  the  audio  features  are,  higher  the  probability  that  the  geo-­‐loca-ons  are  closer.Similarly  this  can  be  included  in  the  node  poten-als  as  well  as  for  the  visual  features.

Sunday, August 26, 12

Page 39: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Multimodal Results: MediaEval 2011Dataset

0"

10"

20"

30"

40"

50"

60"

70"

80"

90"

10m" 100"m" 1"km" 5"km" 10"km" 50"km" 100"km"

Coun

t"[%]"

Distance"between"es>ma>on"and"ground"truth"

Figure 3: The resulting accuracy of the algorithm

as described in Section 5.

used as the final distance between frames. The scaling of theweights was learned by using a small sample of the trainingset and normalizing the individual distance distributions sothat each the standard deviation of each of them would besimilar. We use 1-NN matching between the test frame andthe all the images in a 100 km radius around the 1 to 3 co-ordinates from the previous step. We pick the match withthe smallest distance and output its coordinates as a finalresult.

This multimodal algorithm is less complex than previousalgorithms (see Section 3), yet produces more accurate re-sults on less training data. The following section analysesthe accuracy of the algorithm and discusses experiments tosupport individual design decisions.

6. RESULTS AND ANALYSISThe evaluation of our results is performed by applying

the same rules and using the same metric as in the MediaE-val 2010 evaluation. In MediaEval 2010, participants wereto built systems to automatically guess the location of thevideo, i.e., assign geo-coordinates (latitude and longitude) tovideos using one or more of: video metadata (tags, titles),visual content, audio content, and social information. Eventhough training data was provided (see Section 4), any “useof open resources, such as gazetteers, or geo-tagged articlesin Wikipedia was encouraged” [16]. The goal of the taskwas to come as close as possible to the geo-coordinates ofthe videos as provided by users or their GPS devices. Thesystems were evaluated by calculating the geographical dis-tance from the actual geo-location of the video (assignedby a Flickr user, creator of the video) to the predicted geo-location (assigned by the system). While it was importantto minimize the distances over all test videos, runs werecompared by finding how many videos were placed within athreshold distance of 1 km, 5 km, 10 km, 50 km and 100 km.For analyzing the algorithm in greater detail, here we alsoshow distances of below 100m and below 10m. The lowestdistance category is about the accuracy of a typical GPSlocalization system in a camera or smartphone.

First we discuss the results as generated by the algorithmdescribed in Section 5. The results are visualized in Figure 3.The results shown are superior in accuracy than any system

0"10"20"30"40"50"60"70"80"90"

10m" 100"m" 1"km" 5"km" 10"km" 50"km" 100"km"

Coun

t"[%]"

Distance"between"es>ma>on"and"ground"truth"

Visual"Only" Tags"Only" Visual+Tags"

Figure 4: The resulting accuracy when comparing

tags-only, visual-only, and multimodal location esti-

mation as discussed in Section 6.1.

presented in MedieEval 2010. Also, although we added ad-ditional data to the MediaEval training set, which was legalas of the rules explained above, we added less data thanother systems in the evaluation, e.g. [21]. Compared to anyother system, including our own, the system presented hereis the least complex.

6.1 About the Visual ModalityProbably one of the most obvious questions is the impact

of the visual modality. As a comparison, the image-matchingbased location estimation algorithm in [8] started reportingaccuracy at the granularity of 200 km. As can be seen inFigure 4, this is consistent with our results: Using the lo-cation of the 1-best nearest neighbor in the entire databasecompared to the test frame results in a minimum accuracyof 10 km. In contrast to that, tag-based localization reachesaccuracies of below 10m. For the tags-only localization wemodified the algorithm from Section 5 to output only the 1-best geo-coordinates centroid of the matching tag with low-est spatial variance and skip the visual matching step. Whilethe tags-only variant of the algorithm performs already well,using visual matching on top of the algorithm decreases theaccuracy in the finer-granularity ranges but increases over-all accuracy, as in total more videos can be classified below100 km. Out of the 5091 test videos, using only tags 3774videos can be estimated correctly with an accuracy better100 km. The multimodal approach estimates 3989 correctlyin the range below 100 km.

6.2 The Influence of Non-Disjoint User SetsEach individual person has his own idiosyncratic method

of choosing a keyword for certain events and locations whenthey upload videos to Flickr. Furthermore, the spatial vari-ance of the videos uploaded by one user is low on average.At the same time, a certain amount of users uploads manyvideos. Therefore taking into account to which user a videobelongs seems to have a higher chance of finding geographi-cally related videos. For this reason, videos in the MediaEval2010 test dataset were chosen to have a disjoint set of usersfrom the training dataset. However, the additional trainingimages provided for MediaEval 2010 are not user disjoint

Sunday, August 26, 12

Page 40: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

How is the Class Organized?

• Once a week: lectures fundamental introduction (from me)

• Once a week: Project meetings• Guest lectures by YouTube, Yahoo!,

Intel, TU Delft.

Sunday, August 26, 12

Page 41: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Hands-On Part

• Do a project on a large scale data set• Data accessible through ICSI accounts • Additional computation: $100 in

Amazon EC 2 money•

Sunday, August 26, 12

Page 42: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

Lecture Material

• Background material for lectures:G. Friedland, R. Jain: Introduction to Multimedia Computing, to appear at Cambridge University Press, 2012.

• (Constantly changing) draft availablehttp://www.mm-creole.org

Sunday, August 26, 12

Page 43: Hands On: Multimedia Methods for Large Scale Video Analysisfractor/fall2012/cs294-1-2012.pdf · day • Twitter: up to 120M messages per day 3 Sunday, August 26, 12. ... • Classification

How do you receive Credit?

• 3 Credits (graded or ungraded) based on:– 2 quizzes– a project

Sunday, August 26, 12