There is currently a major push towards creating interactive and immersive experiences for general consumers, as well as for specialized cutting edge applications, with major investment in virtual and augmented reality covering both content acquisition and display/playback. Widespread adoption of these applications is critically dependent on the emergence of new capabilities to efficiently encode and stream multimedia content at very low latency. Traditionally, low latency compression has been approached by efficient prediction to exploit temporal redundancies.
However, while prediction is the mainstay in video and speech compression, it has largely been neglected in audio compression which relies heavily on transformation of long blocks which, in turn, incurs significant delay. The reason for the recourse to long frame transformation is that polyphonic audio, or the mixture of multiple periodic components plus noise, is known to be difficult to predict effectively, and has eluded standard prediction approaches. Consequently, current state-of-the-art audio coding and networking technologies fall short of handling the conflicting objectives of low latency, low complexity and low bitrate, as required for providing rich audio to future immersive virtual reality applications.
Our audio coding technology is geared towards next-generation immersive virtual reality applications and satisfies all the relevant delay, complexity and bit rate constraints, by employing a novel paradigm for efficient and effective prediction with the capability to fully exploit the redundancies implicit in all periodic components of rich polyphonic signals, without incurring delay. Our approach is unique, and essentially enables widespread and low cost commercialization of interactive immersive applications, where rich audio is a must. It is based on the concept of cascaded long term prediction (CLTP), which enables joint prediction of all periodic components in the mixture, from the immediately preceding segment of samples, i.e., at low delay.
The CLTP paradigm enables highly effective prediction of polyphonic audio, thereby offering a transformative means to achieve near-optimal audio compression at low latency, by circumventing the need for long block/frame transformation on which current state-of-the-art audio coders depend to achieve efficient compression.