cider: consensus-based image description evaluation ramakrishna vedantam, larry zitnick, and devi...

Download CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam, Larry Zitnick, and Devi Parikh

If you can't read please download the document

Upload: tori-cresap

Post on 14-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam, Larry Zitnick, and Devi Parikh Slide 2 Automatic Image Description Lot of activity Good evaluation protocols are important Middlebury Berkeley Segmentation Dataset PASCAL Slide credit: Devi Parikh Slide 3 Automatic Image Description Evaluation has been problematic Automatic metrics BLEU: does not correlate well with human perception ROUGE: biased towards long sentences Ranking-based: unable to evaluate novel sentences [Hodosh et al.], [Kulkarni et al.], etc. Slide credit: Devi Parikh Slide 4 Automatic Image Description Evaluation has been problematic Human evaluation Expensive Low reproducibility Measure different factors, how to consolidate? Measuring overall quality What humans like is often different from human-like Slide credit: Devi Parikh Slide 5 Human-like vs. what humans like A man on a black motorcycle looking left. (What humans like) A man on a black motorcycle. (Human-like) Slide credit: Devi Parikh Slide 6 [1] A man sits on a motorcycle [2] A large, older man sits on a motorcycle. [3] A man is waiting on a motorcycle. [4] A man riding a motorcycle out of a parking lot [5] A man is sitting on his motorcycle. [6] A man on his motorcycle. [7] A big man sits on a motorcycle. [8] A person is riding a motorcycle. [9] A man on a black bike idling in a parking lot. [10] An overweight man sitting on a Harley motorcycle. [11] a man siting on his bike looking behind him [12] A man sits on a motorcycle. [13] A man sits on a motorcycle [14] A man stopped sitting on top of a motorcycle. [15] A biker is getting ready to pull out. [16] A man takes his motorcycle out on a warm night. [17] A man on a motorcycle [18] A man stands stationary on a black motorcycle [19] A middle aged man is sitting on a black motorcycle. [20] A man riding his motorcycle in a parking lot. [21] A man on a motorcycle [22] A man is riding his motorcycle out of a parking lot. [23] A person is sitting on a motorcycle. [24] A man is sitting on a motorcycle. [25] An older man sits atop his motorcycle. [26] A man is sitting on a motorcycle [27] Man sitting on a motorcycle [28] A man is riding a motorcycle. [29] A man is sitting on his motorcycle in a parking lot. [30] A heavy set man with blue jeans on is getting ready to take off on his motor bike [31] A man is sitting on a motorcycle in the parking lot. [32] A man sitting on a large black motor cycle. [33] The guy is ready to go on a ride on his bike [34] A bearded man is sitting on a black motorcycle [35] A man sits on his sparkling black motorcycle. [36] An overweight man on a motorcycle looks to his left in a parking lot. [37] A large man riding on his motorcycle. [38] A man is on a bike. [39] There is a heavyset man with a graying beard sitting on a motorcycle. [40] A man is sitting on his black motorcycle. [41] A man sitting on his motorcycle. [42] A man is sitting on a motorcycle. [43] A man is sitting on his motorcycle. [44] A man is sitting on a motorcycle. [45] There is a man on the black motorcycle. [46] A large man sitting on a motorcycle. [47] A man sitting on a motorcycle. [48] A man sitting on top of a motorcycle. Slide credit: Devi Parikh Slide 7 Taking a step back Image description is like Machine translation Techniques BLEU Text summarization ROUGE A bit of both and some more? Slide credit: Devi Parikh Slide 8 Taking a step back The (a?) goal of automatic image description is to produce captions that are human-like. Slide credit: Devi Parikh Slide 9 Proposal Directly measure human-likeness. How well does a description match how most people tend to describe the image? Consensus-based image description Slide credit: Devi Parikh Slide 10 Proposal New annotation modality to measure agreement of description with consensus New datasets to capture consensus accurately New automatic metric (CIDEr) to automatically measure agreement with consensus Slide credit: Devi Parikh Slide 11 Annotation Modality score(candidate) = proportion-times-candidate-is-picked Slide credit: Devi Parikh Slide 12 Datasets Slide credit: Devi Parikh Slide 13 Metric candidate sentence set of reference sentences average over references cosine similarity j-th reference TF-IDF vector (n-gram) Captures Saliency and importance Accuracy (vs. precision or recall) Consensus across references (mean) Higher-order semantics Grammaticality Slide credit: Devi Parikh Slide 14 Experiments How well does CIDEr capture human judgment of consensus? How well do some of the existing* image captioning approaches capture human consensus? * as of ~Spring 2014 Slide credit: Devi Parikh Slide 15 Metric to evaluate automatic metrics Given two candidates Compute human annotated consensus score for both Compute automatic metric for both Do automatic and human agree on which candidate is better? Accuracy(automatic) = % candidate pairs where automatic agrees with human Slide credit: Devi Parikh Slide 16 Data Candidate pairs Human-correct vs. human-random (HI) Human-correct vs. human-correct (HC) Human vs. machine (HM) Machine vs. machine (MM) Slide credit: Devi Parikh Slide 17 Data Five machine approaches Midge [Mitchell 2012] Baby talk [Kulkarni 2011] Video and Video+ [Rohrbach 2013] Story [Farhadi 2010] Slide credit: Devi Parikh Slide 18 Baselines BLEU ROUGE METEOR Note: Evaluation thus far had been with 5 reference sentences Slide credit: Devi Parikh Slide 19 Results 8% Slide credit: Devi Parikh Slide 20 Results Slide credit: Devi Parikh Slide 21 Results Slide credit: Devi Parikh Slide 22 Qualitative results BLEU: Owl. ROUGE: A multicolored owl, with black, white, and camel-colored feathers is looking to the left of the camera. CIDEr: An owl is sitting in a tree. Slide credit: Devi Parikh Slide 23 [1] An owl is sitting in a tree. [2] An owl is sitting on a branch. [3] An owl is looking towards the camera. [4] An owl is perched in front of a tree. [5] An owl is perched and staring into the camera. [6] An owl is looking for its prey. [7] An owl is looking in the direction of the camera. [8] An owl is just sitting there. [9] An owl is peering into the distance. [10] a close up of an owl [11] An owl. [12] An owl is staring hard at something. [13] Closeup of an owl [14] An owl is sitting with its eyes slightly squinted. [15] A picture of an owl standing still [16] An owl is puffing itself up [17] An owl stares into the distance [18] An owl with black and brown feathers. [19] An owl sitting still. [20] An owl perched in a tree. [21] A close up picture of an owl awake. [22] An owl sits with eyes open. [23] An owl with brown, yellow, and white feathers sitting in a tree. [24] A black and orange owl is there. [25] A spotted owl is resting. [26] A beautiful owl is perched looking into the camera [27] A colorful owl is staring straight ahead. [28] An owl with puffy eyebrows. [29] a close up of a owl looking at something [30] A owl with black, brown, white, and orange feathers. [31] owl [32] There is an owl appearing to focus on something. [33] A white and brown owl with it's eyes open wide. [34] A owl with some black, brown, orange, and white feathers. [35] The head of a multicolored owl [36] A white and brown owl looking straight ahead. [37] Picture of a perched owl. [38] A horned owl with brown, white, and black spots. [39] A gray and black owl stares into space. [40] A brown and white barn owl looks off into the distance. [41] A picture of a pretty owl. [42] Close up shot of a multi colored owl [43] A multicolored owl glares into the distance. [44] A speckle breasted owl is roosting. [45] A multicolored owl, with black, white, and camel-colored feathers is looking to the left of the camera. [46] A owl with narrowed eyes and brown and cream feathers is featured [47] The owl is doing absolutely nothing but watching [48] A stern-looking owl stares intensely into the distance. Slide credit: Devi Parikh Slide 24 Qualitative results BLEU: A cat in a tree. ROUGE: A cat is sitting in the branches of a very skinny tree. CIDEr: A cat stuck in a tree. Slide credit: Devi Parikh Slide 25 [1] A cat stuck in a tree. [2] A cat stuck in a tree. [3] There is a cat stuck in a tree [4] The shot of a cat stuck in a tree [5] A cat is stuck in a tree. [6] A cat is stuck in a tree [7] A cat is standing in a tree. [8] a cat in a tree [9] A cat is high up in a tree. [10] A cat is climbing high up in a tree. [11] A cat is climbing in a tree. [12] A cat stuck in a leafless tree. [13] A tabby cat stuck in a tree in the woods. [14] The cat is up in a tree. [15] cat sitting in a tree [16] A cat is climbing on a tree. [17] A cat stuck high in a tree [18] A cat is climbing a tree branch. [19] A cat climbing in a tree. [20] A cat is perched in the branches of a tree. [21] A cat standing on the branches of a tree. [22] A cat is walking through the branches of a tree. [23] A cat perched high in a tree. [24] A brown and black cat standing in a tree. [25] A picture of a cat and a tree. [26] A cat is up in the tree branches. [27] A cat in the tree. [28] The cat is stuck in the tree. [29] A grey and white cat sits high in a tree. [30] A cat has climbed a tree. [31] A cat is sitting in the branches of a very skinny tree [32] A cat climbing the bare branches of a tree. [33] A cat is climbing on a very tall limb [34] A cat in a leafless tree [35] A cat is caught in a tree and looking to get down. [36] A cat perches on two branches high up in a tree that has lost its leaves. [37] A cat climbing on branches on a tree. [38] a cat standing up high in the trees [39] A white and brindle cat is high up in a leafless tree. [40] A cat clings to the branches of a small tree. [41] A house cat has climbed up a tree. [42] A cat has climbed up halfway to the top of a tree and is inbetween two branches. [43] A cat venturing on a tree. [44] A cat is trapped among tree branches. [45] The cat is high in the thin tree. [46] A white and black colored cat climbing through trees [47] medium cat on top of tree [48] Some trees are seen in a grey sky. Slide credit: Devi Parikh Slide 26 Qualitative results BLEU: A man. ROUGE: A man is sitting with his hands together with two plastic bottles sitting in front of him. CIDEr: A bald man with glasses with two empty bottles in front of him. Slide credit: Devi Parikh Slide 27 [1] A bald man with glasses with 2 empty bottles in front of him. [2] The bald man is sitting down with some bottles in front of him. [3] A man sitting with drink bottles in front of him. [4] A man, with two bottles in front of him, speaking [5] A man is sitting down with beverage bottles in front of him. [6] a man sitting down looking over at something with two water bottles in front of him [7] A man with two plastic bottles in front of him is in the middle of saying something. [8] a man with glasses [9] A balding man with glasses seated down. [10] Two plastic bottles sit in front of a man wearing a watch. [11] A balding man with glasses is staring at the distance. [12] A man is sitting with his hands together will two plastic bottles sitting in front of him. [13] The man sits at a table with a couple bottles of water. [14] A man is talking to someone off camera with two water bottles in the foreground. [15] Man sitting with a bottle of water. [16] A MAn with some bottles [17] A man sitting at a table. [18] A balding man in a yellow shirt. [19] A man with glasses gestures behind plastic bottles [20] A man is talking to someone. [21] A man with a watch holding bottle of water. [22] A man holding two bottles of water while talking. [23] a man [24] There is a bald man in a beige shirt. [25] A man in glasses talking with two drinks in front of him. [26] A man is sitting at a table with drinks in front. [27] A man with glasses looks off into the distance [28] The man has a bottle of water while at the meeting. [29] The bald man is about to speak. [30] Man standing behind a couple bottles [31] A bald man is deep in conversation. [32] The man is ready to drink that water bottle [33] A man wearing a watch is speaking. [34] A couple plastic bottles are being held in front of a man. [35] A blad man is talking behind two water bottles. [36] A man is sitting beyond a couple beverage bottles. [37] A person is wearing a watch and has water bottles. [38] a man sitting with his mouth open [39] A man is talking with refreshments nearby. [40] A man is looking at something off camera [41] Two bottles are in front of a sitting male [42] A man speaks at a table. [43] A man is speaking to someone out of frame. [44] A man holding his hands [45] A bald-headed man sits there listening intently. [46] The man is looking to the left. [47] A man is being offered a drink. [48] A man in a khaki shirt is caught mid-sentence. Slide credit: Devi Parikh Slide 28 Conclusions Directly measure what we care about and can capture: human-likeness. A new image description evaluation protocol based on human consensus New annotation modality New datasets New automatic metric More reference sentences help. Slide credit: Devi Parikh Slide 29 Thank you. Slide credit: Devi Parikh