synchonization of senses - inria€¦ · rémi ronfard, vineet gandhi, laurent boiron. 2nd workshop...
TRANSCRIPT
SynchronisationofSensesFromTexttoSpeech…toMovie
RémiRonfardCVAM/ICCVOct23,2017
1
Introduction• IMAGINEteamatINRIAonnaturalinterfacesfordesigningshapes,motionsandstories
• Buildinteractivenarrativeenvironmentswheretheuseristhedirector– Requiresanexplicitrepresentationofstorygoals:characteractions,eventsandtheircausalrelations
– Requiresadirectablefilmcrewofvirtualactors,cameramen,lightingtechnicians,etc.
Scientificchallenges
• Naturallanguageandstoryunderstandingforscriptanalysis
• Generativeaudio-visualmodels• Proceduralmodelsfor3Dscenegeneration• Behavior-based3Danimationfordirectingvirtualactors
• Virtualcinematographyforplacinglightsandcamerasautomaticallyandeditingthemtogethertoasinglestringoffilm
Outline–Text-to-movie–Generativeaudiovisualprosodymodelforvirtualactors–Eisenstein’stheoryofverticalmontage–Continuityeditingfor3Danimation
4
Motivation:Text-to-Movie
Hitchcock’sdreamofamachineinwhichhe’d“insertthescreenplayatoneendandthefilmwouldemergeattheotherend”(Truffaut/Hitchcock,p.330)
Script Storyboard Stage EditingRoom
VideoGame/LiveAction/3DAnimation
XtranormalText-to-Movie©
• Startupcreatedin2006inMontreal• Missionstatement:3-Danimationtoolsfordigitalstorytelling
• «Ifyoucanwrite,youcanmakemovies»• Shutdownin2013,re-bornin2015as«Nawmal»
Createashortmovieinfoureasysteps….
1. Picktemplate,characters&voicesfromlibraries…
…withoutworryingaboutcinematographyandediting
2.Typedialogandinsertgesturesandeffects…3.View&edityourwork…4- Publish…
1. Pick template, characters & voices from libraries…
4.Publish
Text-to-movie:NawmalMake
9
Text-to-movie:Nawmalsmartcameras
10
Text-to-speech(TTS)• Atext-to-speech(TTS)systemconvertsnormallanguagetextintospeech;othersystemsrendersymboliclinguisticrepresentationslikephonetictranscriptionsintospeech.
• Allen,Jonathan;Hunnicutt,M.Sharon;Klatt,Dennis(1987).FromTexttoSpeech:TheMITalksystem.CambridgeUniversityPress.
11
Parametrictext-to-speech(TTS)
• Abeginners’guidetostatisticalparametricspeechsynthesis,SimonKing,2010.12
13Inpress:IEEEComputerGraphicsandApplications,Nov/Dec2017.
Exercisesinstyle
14
Emotionsandattitudes
15
• Actorsexpressdramaticattitudesusingthecoordinatedprosodyofvoice,rhythm,facialexpressionsandheadandgazemotion.
• Weproposeamethodforgeneratingnaturalspeechandanimationinvariousattitudesusingneutralspeechandanimationasinput.
Audioprosody• High-levelfeatures:pitch,durationandintensitypersyllable
• Low-levelfeatures:voicequalities
16
Visualprosody
17
• High-levelfeatures:shoulder,headandeyemovements
• Low-levelfeatures:facialexpressions• VisualProsody:FacialMovementsAccompanyingSpeech,HansPeterGraf,EricCosatto,VolkerStrom,FuJieHuang,FaceandGesture,2002.
Exercisesinstyle
18
GenerativeAudiovisualProsodicModel
19Dramaticattitude:seductive
GenerativeAudiovisualProsodicModel
20Dramaticattitude:scandalized
GenerativeAudiovisualProsodicModel
21Dramaticattitude:thinking
Speech-drivenanimation
22
• ErikaChuangandChristophBregler.2005.Moodswings:expressivespeechanimation.ACMTrans.Graph.2005.
• StacyMarsella,YuyuXu,MargauxLhommet,AndrewFeng,StefanScherer,andAriShapiro.2013.Virtualcharacterperformancefromspeech.SymposiumonComputerAnimation(SCA'13).
• TeroKarras,TimoAila,SamuliLaine,AnttiHerva,andJaakkoLehtinen.2017.Audio-drivenfacialanimationbyjointend-to-endlearningofposeandemotion.ACMTrans.Graph.36,4,July2017.
GeneralizedSpeechAnimation
23
• SarahTaylor,TaehwanKim,YisongYue,MosheMahler,JamesKrahe,AnastasioGarciaRodriguez,JessicaHodgins,andIainMatthews.2017.Adeeplearningapproachforgeneralizedspeechanimation.ACMTrans.Graph.36,4,July2017.
GeneralizedSpeechAnimation
24
Text-drivenanimation
25
• IreneAlbrecht,JörgHaber,KoljaKähler,MarcSchröder,andHans-PeterSeidel.2002."MayItalktoyou?:-)"FacialAnimationfromText.PacificGraphics,2002.
Expressiveconversion
26
• JointGaussianMixtureModelsofexpressionpairs• DanielVlasic,MatthewBrand,HanspeterPfister,andJovanPopovic.2006.Facetransferwithmultilinearmodels.InACMSIGGRAPH2006Courses(SIGGRAPH'06).
Ourapproach:prosodiccontours
27
Ourapproach:prosodiccontours
28
• F=voicepitch,H=headmotion,G=gazemotion,U=upper-face,L=lower-face,C=rhythm,E=energy
Learningaudiovisualprosody
29
Generatingaudiovisualprosody
30
Experimentalresults
31
• Thankyouforthelovelyflowers:thinking,ironic,scandalized
Experimentalresults
32• You’rewelcome(fascinated,doubtful,embarrassed)
Subjectiveevaluation
33
Subjectiveevaluation
34• CF=comforting,FA=fascinated,TH=thinking,DO=doubtful,C0=confronted,EM=embarrassed
Subjectiveevaluation
35• CF=comforting,FA=fascinated,TH=thinking,DO=doubtful,C0=confronted,EM=embarrassed
Exercisesinstyleresults
36
Eisenstein,synchronizationofsenses
37
Eisenstein,synchronizationofsenses
MONTAGE defined as:• Piece A, derived from the elements of the theme being developed• Piece B, derived from the same source • in juxtaposition give birth to the image in which the thematic matter is most clearly embodied.
38
Eisenstein,synchronizationofsenses
Representation A and representation B must be so selected from all the possible features within the theme that their juxtaposition shall evoke in the perception and feelings of the spectator the most complete image of the theme itself.
39
Eisenstein,synchronizationofsenses– Transitionfromsilentmontagetosound-picture,oraudio-visualmontagechangesnothinginprinciple.Ourconceptionofmontageencompassesequallythemontageofthesilentfilmandofthesound-film.
– However,thisdoesnotmeanthatinworkingwithsound-film,wearenotfacedwithnewtasks,newdifficulties,andevenentirelynewmethods.
– Onthecontrary!40
Eisenstein,synchronizationofsenses– Thatiswhyitissonecessaryforustomakeathoroughanalysisofthenatureofaudio-visualphenomena.
– Ourfirstquestionis:Whereshallwelookforasecurefoundationofexperiencewithwhichtobeginouranalysis?
41
Eisenstein,synchronizationofsenses–Manandtherelationsbetweenhisgesturesandtheintonationsofhisvoice,whicharisefromthesameemotions,areourmodelsindeterminingaudio-visualstructures,whichgrowinanexactlyidenticalwayfromthegoverningimage.
42
Eisenstein,synchronizationofsenses
43
Eisenstein,synchronizationofsenses
44
– Torelateimagewithsound,wefindanaturallanguagecommontoboth-movement.
–Movementwillrevealallthesubstrataofinnersynchronizationthatwewishtoestablishinduecourse.Movementwilldisplayinaconcreteformthesignificanceandmethodofthefusionprocess.
Eisenstein,synchronizationofsenses
45
– Letusexamineanumberofdifferentapproachestosynchronizationinlogicalorder.
– Thefirstisapurelyfactualsynchronization:thesound-filmingofnaturalthings(acroakingfrog,themournfulchordsofabrokenharp,therattleofwagonwheelsovercobblestone).
Eisenstein,synchronizationofsenses
46
– Inthemorerudimentaryformsofexpressionbothelements(thepictureanditssound)willbecontrolledbyanidentityofrhythm,accordingtothecontentofthescene.
– Thisisthesimplest,easiestandmostfrequentcircumstanceofaudio-visualmontage,consistingofshotscutandeditedtogethertotherhythmofthemusicontheparallelsound-track.
Eisenstein,synchronizationofsenses
47
–Wecansurelyfindashotwhosemovementharmonizesnotonlywiththemovementoftherhythmicpattern,butalsowiththemovementofthemelodicline.
– (…)– Synchronizationcanbenatural,metric,rhythmic,melodicandtonal.
Eisenstein,synchronizationofsenses
48
Eisenstein,synchronizationofsenses
49
ContinuityEditingfor3DAnimation
QuentinGalvaneRémiRonfardChristopheLinoMarcChristie
Twenty-NinthAAAIConference
2015
50
Objectives
➢Readactionsanddialoguesfromscript
➢Generatespeechandanimation
➢Placecamerasandlights,generaterushes
➢Edittherushesintoamovie
51
… GoldiespeakstoGeorge
GeorgespeakstoGoldie
GoldiespeakstoGeorge
Related work
Idiombasedsolutions
Scenario
Virtualcinematographer[Christiansonetal.1996]
52
… GoldiespeakstoGeorge
GeorgespeakstoGoldie
GoldiespeakstoGeorge
Related work
Scenario
Allcamerasevaluatedovertheentirebeat
Alltransitionsevaluatedatbeatchanges
[Riedl,M.etal.,2008]
OptimizationbasedapproachDynamicprogramming
53
… GoldiespeakstoGeorge
GeorgespeakstoGoldie
GoldiespeakstoGeorge
Our approach
Scenario
EvaluateallpossibletransitionsRhythm
54
➢Filmeditingasanoptimizationproblem▪Semi-Markovchains
➢Createaneditinggraphthatevaluates3aspects:
▪Shotquality
▪Cutquality
▪Rhythm
Outline
55
➢ Searchoversemi-Markovchainss=(rj,dj)givenactionsa(t)
➢Minimizecostfunction:
Actioncost(Shotquality)
Transitioncost(Cutquality)
Rhythmcost(RhythmicQuality)
Thefinaleditingisgivenbytheshortestpathintheeditinggraph
Film editing as optimization
56
➢Shotquality:
▪ Hitchcockprinciple
Shot Selection
Thesizeofacharacteronthescreenshouldbeproportionaltoitsnarrativeimportanceinthestory.
•Narrativeimportancefromscript•VisibleareaV=S–Oforeachrush
57
Actorsandactions
58
Continuity editing
59
Results
60
Limitations & Future work
Limitations
➢ Audiotracksandcamerasmustbepre-computed
➢ Cannothandleellipsisorflashbacks
➢ Cannothandlebook-ending
▪ Contextfreegrammar
61
Limitations & Future work
Futurework
➢ Optimizeovercamerapositionsandmovements
➢ Extendtoliveactionvideo
➢ Learnothereditingstylesfromrealmovies [Gandhietal.,2014]
[Galvaneetal.,2014]
62
ComputationalVideoEditingforDialogue-DrivenScenesMackenzieLeake,AbeDavis,AnhTruong,ManeeshAgrawala
63
Whataboutverticalediting?• CanwelearnstatisticalmodelsofEisensteinstyleofmontage?• Harderthancontinuityediting• Semi-Markovmodelstillrelevant• Verticalrelationsbetweensoundandpicture• Verticalrelationsbetweenvirtualcamerashots
64
Whataboutverticalediting?• Semi-Markovmodelscanbeuseful!• syllableandsentencedurations• shotandscenedurations• actiondurations
• MultimodalSemi-Markovmodelsneeded!
65
Conclusion• Generativeaudiovisualmodels• expressionofemotionsandattitudeswithprosody
• expressionofnarrativeswithvideoediting• Motivatedbytexttomovieconversion• Alsoimportantformoviedescription
66
References• The Prose Storyboard Language: A Tool for Annotating andDirecting
Movies.RémiRonfard,VineetGandhi, LaurentBoiron.2ndWorkshopon Intelligent Cinematography and Editing part of Foundations ofDigitalGames-FDG2013.
• Narrative-Driven Camera Control for Cinematic Replay of ComputerGames.QuentinGalvane,RémiRonfard,MarcChristie,NicolasSzilas.MIG2014.
• Beyond Basic Emotions: Expressive Virtual Actors with SocialAttitudes. Adela Barbulescu, Rémi Ronfard, Gérard Bailly, GeorgesGagneré, Huseyin Cakmak. 7th International ACM SIGGRAPHConferenceonMotioninGames2014.
67
References• Continuity Editing for 3D Animation. Quentin Galvane, Rémi
Ronfard, Christophe Lino, Marc Christie. AAAI Conference onArtificialIntelligence,Jan2015.
• Camera-on-rails: Automated Computation of Constrained CameraPaths. Quentin Galvane, Marc Christie, Christophe Lino, RémiRonfard. ACM SIGGRAPH Conference on Motion in Games, Nov2015.
• Implementing Hitchcock - the Role of Focalization and Viewpoint.Quentin Galvane, Rémi Ronfard. Eurographics Workshop onIntelligentCinematographyandEditing,Apr2017.
68
References• Five Challenges for Intelligent Cinematography and Editing.
Rémi Ronfard. Eurographics Workshop on IntelligentCinematographyandEditing,Apr2017.
• Which prosodic features contribute to the recognition ofdramatic attitudes? Adela Barbulescu, Rémi Ronfard, GérardBailly.SpeechCommunication,Aug.2017.
• AGenerative Audio-Visual ProsodicModel for Virtual Actors.AdelaBarbulescu,RémiRonfard,GérardBailly. IEEEComputerGraphicsandApplications,Nov./Dec.2017.
69