June 29, 2016, 10:13 – 3D Sound Labs
We therefore propose a quick overview of the three paradigms as well as an analysis on why Ambisonics makes sense for VR Audio with all 3d audio formats.
In the channel based representation, the unit of information is the loudspeaker. Each channel is associated to a loudspeaker and the sound reproduction is made by mixing the various channels on several speakers. The more channels, the more spatial sound capabilities. The channel based representation has been the traditional sound representation used for the past 50 years or more. The Stereo, 5.1, 7.1 formats are channel based horizontal representations. 3D is obtained by adding elevated speakers, like in the 11.1 format, where 4 ceiling speakers are added to a 7.1 horizontal speaker layout. One of the main drawback of the multi-channel audio representation is that it is loudspeaker set up dependent and that one needs one mix for type of each set-up, whereas Object-based and Ambisonics contents are independent of the loudspeaker set-up.
In the Object-based representation, the unit of information is the sound source. A scene is made of several sound sources and information about their locations, their directivity patterns and the rendering environment (room size, reverberation parameters…). The 3D audio rendering is made by calculating the combination of all the sources, including the reverberation, at the listener position. This is a great paradigm to interactively create content, but it also uses a lot of CPU resources. The more complex (number of sound sources) and realistic (precision of the reverberation) the scene, the more CPU is needed.
Unlike the two other representations, the Ambisonics format does not rely on the description of individual sound sources (speakers or objects) but instead represents the resulting sound field at the listener’s position. The mathematical formalism used to describe the sound field is called spherical harmonics and the unit of information is the number of component (or the Order) of this spherical representation. The more components or the higher the order you have, the more precision in the spatial representation of the scene you get. This paradigm is not new and has been used by a small sound professional community for several decades with a concept called the B-Format which is in fact a Higher Order Ambisonics representation at the 1st order.
Ambisonics is a very attractive solution for VR
Several key players (Google, Sony…) in the VR industry are now embracing the concept of Higher Order Ambisonics and are developing commercial applications based on this paradigm, like the emblematic YouTube 360 platform that uses HOA as its default audio format. There are several reasons behind this choice. The most important ones are the following:
At 3D Sound Labs, we believe that the multichannel is gradually becoming a legacy format not so well suited to the need of VR audio and that Object-based and Ambisonics are the formats needed for VR. We have developed the following vision of the future:
For the interactive content rendering, like VR Gaming, amon 3d audio formats, the Object based paradigm makes a lot of sense. However, the rendering of complexes sound scenes made of many sound objects is very computing intensive and requires resources not necessarily available on mass market platforms. That can be solved by converting all or part the Object based representation into Ambisonics and leveraging the scalability of the rendering to adapt to available CPU resources.
Discover more about 3d audio formats technology