progressive perceptual audio rendering of complex scenes thomas moeck - nicolas bonneel - nicolas...
TRANSCRIPT
Progressive Perceptual Audio Rendering of Complex Scenes
Thomas Moeck - Nicolas Bonneel - Nicolas Tsingos - George Drettakis - Isabelle Viaud-Delmon - David Alloza1- REVES/INRIA Sophia-Antipolis 2- Computer Graphics Group, University of Erlangen-Nuremberg
3- CNRS-UPMC UMR 7593 4- EdenGames
1,2 1 1 1 3 4
Objectives
Efficient audio rendering of very complex scenes with moving sources
Without audible impairment of the quality
Verify results by user tests
Previous Work Rendering complex auditory scenes
Clustering [Tsingos et al. 2004]: replace many sources with a representative
Still can only treat ~200 sound sources (cost of clustering itself)
Scalable audio processing Importance-guided processing of few frequency/time bins [Fouad et
al. 1997, Wand & Straßer 2004, Gallo et al. 2005, Tsingos 2005]. Audio processing (e.g., HRTF, spatialization) is expensive
Crossmodal effects Neuroscience Literature: “Ventriloquism affects 3D audio
perception” Ventriloquism spatial window can vary from a few up to 15 degree Few papers on ecological experiments
Methodology
Recursive approach to clustering Reduce cost of clustering
Scalable perceptual premixing Faster premixing without audible loss of quality
Taking perceptual and cross-modal information into account Improve audio clustering algorithm
User experiments to detect improvement possibilities Improving quality with results of tests
Validation of resulting algorithms
Overview of the algorithms
Masking of inaudible sources (with energy) Clustering of remaining sources Progressive premixing within each cluster Spatial audio processing (HRTF)
recursive
Our Work
Optimized recursive approach of clustering Clustering performance evaluation
Improved scalable perceptual premixing Quality evaluation study
Study of cross-modal effects by user experiments Using results of cross-modal studies to develop
audio-visual clustering algorithm
Optimized Recursive Clustering
Recursive splitting of clusters Fixed-budget approach
Using a fixed number of clusters Variable-budget approach
Splitting clusters until break condition is reached Break condition: Average angle error Optimal number of clusters
Variant used by EdenGames 8 cluster budget Local clustering when necessary
Eden Games’ implementation Test Drive Unlimited
Clustering Performance Evaluation
Performance of recursive algorithms are clearly better
Improved progressive scalable perceptual premixing (1)
After clustering: Premixing in each cluster Why? Effects can be done afterwards -
less cost because viewer signals Only premixing necessary data
Assigning frequency bins to sound sources (iterative importance sampling) by using pinnacle value
Improved progressive scalable perceptual premixing (2)
premixing
clustering
Improved progressive scalable perceptual premixing (3)
Iterative importance sampling Calculation of importance value from
energy, loudness or audio saliency map Assignment of frequency proportional to
importance until pinnacle value is reached
Reassignment of remaining frequencies to sounds relative to importance values
Varying budget
Quality Evaluation Study (1)
MUSHRA (“Multiple Stimuli with Hidden Reference and Anchors”) test of perceptual premixing
7 subjects, aged from 23 – 40 Ambient, music and speech Various budgets (2% – 25 %) With and without pinnacle value Using loudness or saliency as
importance value
Quality Evaluation Study (2)
Results: Approach is capable of
generating high quality using 25% of the original data
Acceptable results with 10% (2% in case of speech)
Significant Effects: Budget Importance value Pinnacle value
Study of Cross-Modal Influences – Questions
Do we need more or fewer clusters in the viewing frustum? We move spatial position of sound
sources to representative in cluster How tolerant are we to this error ?
Do visuals influence the perceived quality?
Study of Cross-Modal Influences – Setup (1)
Study of Cross-Modal Influences – Setup (2)
Study of Cross-Modal Effects – Setup (3)
Uniform distribution [1/4]
[2/3] condition
[3/2] condition
[4/1] condition
Study of Cross-Modal Influences – Results
Statistical analysis of the results shows: We need more clusters
in the viewing frustum No significant difference
of visuals/no-visuals but possible cross-modal effect
Modifying the algorithm
Introducing weighting term in clustering:
Increasing number of clusters in the viewing frustum
Cross-Modal illustration
Video: Putting it all together
Conclusions
Up to nearly 3000 sound sources possible in good quality Main limitation are graphics (!)
Better quality because more clusters in viewing frustum
Future work experiment with auditory saliency
measurements handle procedurally synthesized sounds?
Questions?