[deep dish] food image reorganization using cnn...
TRANSCRIPT
Printing:Thisposteris48”wideby36”high.It’sdesignedtobeprintedonalarge
CustomizingtheContent:Theplaceholdersinthisformattedforyou.placeholderstoaddtext,orclickanicontoaddatable,chart,SmartArtgraphic,pictureormultimediafile.
Tfromtext,justclicktheBulletsbuttonontheHometab.
Ifyouneedmoreplaceholdersfortitles,makeacopyofwhatyouneedanddragitintoplace.PowerPoint’sSmartGuideswillhelpyoualignitwitheverythingelse.
Wanttouseyourownpicturesinsteadofours?Noproblem!JustrightChangePicture.Maintaintheproportionofpicturesasyouresizebydraggingacorner.
[DeepDish]FoodImageReorganizationusingCNNHaichen [email protected] AbhishekGoswami
BACKGROUND• WhensearchingforarestaurantonYelporTripAdvisor,peoplemayseeimagesof
somefoodthattheyareinterestedin.
• However,theydonotknowwhattheyareorwhethertheytasteasgoodastheylook.
• Hence,DeepDishisusingCNNtorecognizedifferenttypesoffoodordishes,sothatpeoplecanknowthename,ingredient,andeventhetaste.
PROBLEMSTATEMENT• DeepDishstartsfromrecognizingthenamesof15differenttypesoffoodcoming
fromAmerica,Italy,Japan,Chinabythecorrespondingimages.
• InPhase2,theDeepDishwilluseunsupervisedlearningtorecognizefoodfromaYelpreviewimageswithpeopleorotherbackgroundobjects.
DATASETS• TheimagesarecomingfromImageNetandFlickr.Weselectasetofimagesof
relativelygoodquality,i.e.withgoodfocus,lessbackground,andbeingtypical.
• Positiveexamples
• Negativeexamplesa. Imageisnotfocusonthetempuradish
b. Toomuchbackground
c. Notatypicalcurrylamb
METHODS/ALGORITHMS/MODELS• Wetriedtwomodelsbesidesthebaseline.BothofthemaresimilartoVGG16but
lessofnumbersoffiltersinConvlayers,andlessofnumbersofparametersindenselayersduetothehardwarelimitofourinstance.
EXPERIMENTALEVALUATION• Afterrunning15epochs,thebasicmodelwithoutanyaugmentationachievesan
accuracyof51.2%onvalidatingset,whilethetrainingsetsareover-fitted.
• Byprintingoutthemistakes,wefoundseveralerrorpatterns.Themodelcannotdifferentiatethefoodifthecolorissimilar.Hereareseveralexamples.• Imagedhasfishandchips,butourmodelrecognizesitaseggBenedict.Itmakesthis
mistakebecausethecoloroffriedfishissimilarwiththesourceoftheeggBenedict,andthatthenapkinlooksliketheegg.
• Imageeispepperonipizza,butthemodelrecognizesitassashimi.Thereasoncouldbethatsashimihasredandwhitepiecesonit.
CONCLUSIONS&FUTUREWORKSThis is a clear image for a typical fish & chips dish
f
ba
• Thevalidatingaccuracycannothavesignificantincreaseaftertuningthehyperparameters.Theaccuracyisfloatingaround65%forvalidatingset.
• ThesimplifiedVGG16modelhavethepotentialtodealwiththisproblem.However,torecognizethefoodimagerequiresthemodeltobeabletodetectthedifferencesoftheshape,colorandtextureofthesmallpiecesoffoodmaterials.
• Wewilltryanothersetoftrainingimagesofhigherquality.
• Next,wewilltryHOGtohighlighttheedgesofthefood,anduseitasafeatureinputtoourmodel.
• Alsoifwehavetime,wewanttoimproveourmodelsothatitcandealwithimageswithcomplexbackground.
• Afterthis,wetrytousedataaugmentationtoreducetheover-fit,andthesystematicerror.Weaddagray-outprocessforasubsetoftheimages,inordertotrainthemodeltorecognizetheimagesbyusingthetextureorshapeinsteadofcolor.• afterrunningfor15epochs,themodelachievesanaccuracyof68.2%forvalidatingset.
• Thenbyprintingoutthemistakes,wefoundsomeofthemistakesmakesomesense,whiletheothersnot.
• ThefirstimagehasScotcheggs,butthemodelrecognizesitasclamchowder.However,thedipdoeslooklikeabowlofchowder.
• Thesecondimagehasscotcheggs,butthemodelrecognizesitascannelloni,whichmakesnonesense
c
ed g
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
TrainingAccuracy
0
0.1
0.2
0.3
0.4
0.5
0.6
ValidatingAccuracy
Label
• Thesizeofthe4th and5th ConvlayerofVGG16shouldbe128,butduetolimitoftheinstance,wereducethenumbersoffiltersto64.
• Asnoticed,theaccuracyoftrainingsetachievedover-fit.Toreducetheover-fit,thepre-processisaddedtograyoutasubsetofthetrainingimages.