Component Analysis for Spatial Audio Reproduction

multi shift principal component analysis based n.w
1 / 15
Embed
Share

Explore the innovative approach of Multi-Shift Principal Component Analysis for spatial audio reproduction to achieve a flexible and efficient representation of sound scenes in digital media, addressing the limitations of existing sound scene representations. The concept involves Primary-Ambient Extraction (PAE) from channel-based audio, utilizing techniques like PCA and Shifted PCA to enhance spatial attributes for improved playback system compatibility.

  • Audio Reproduction
  • Spatial Analysis
  • Principal Component
  • Digital Media
  • Channel-based

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Multi-Shift Principal Component Analysis based Primary Component Extraction for Spatial Audio Reproduction Jianjun HE, Woon-Seng Gan jhe007@e.ntu.edu.sg, ewsgan@ntu.edu.sg 22nd April 2015 Digital Signal Processing Lab, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore

  2. WHY To obtain a new representation of sound scenes in digital media, which is both flexible and efficient in spatial audio reproduction for any playback systems. Existing sound scene representations: Channel-based Conventional, for a specific playback system; Lacks the flexibility to support different playback configurations. Object-based Emerging, for any playback system; Lacks the efficiency: large storage and high transmission bandwidth. Primary rendering Primary components Any Spatial attributes Post- processing Primary-ambient based representation Inspired by human auditory system; Facilitates flexible and efficient rendering. playback system Input PAE Ambient components Ambient rendering Primary-ambient extraction (PAE) from the channel-based audio (e.g., stereo). Existing approaches: mainly for one dominant source in primary components; Subband techniques: problematic for overlapping spectra; PAE with multiple sources (different directions) not well studied. 2

  3. Stereo Signal Model Assumptions Signal = Primary + Ambient Primary components highly correlated Ambient components uncorrelated Primary ambient components uncorrelated Ambient power balanced = p p k = = + + x x p p a a 1 0 0 0 0 a a 0 1 1 1 1 p a i j = P P a a 0 1 k : Primary panning factor 3 J. He, E. L. Tan and W. S. Gan, Linear estimation based primary-ambient extraction for stereo audio signals, IEEE Trans. Audio, Speech, Lang. Process., vol. 22, no. 2, pp. 505-517, Feb. 2014.

  4. PCA for primary extraction ) ) ( ( 2 2 2 2 = + = + T T T T u u x u x u u x u x argmax , argmin , Objective P P 0 P 1 A A 0 A 1 u u A P = = u u u u s.t. , 1, P A P A u Ambient basis A 0x 0 p u Primary basis P 1 p 1x 1 k ( ) ( ) = + = + p p x x x x , k k PCA,0 0 1 PCA,1 0 1 + + 2 2 1 1 k k 4 M. M. Goodwin and J. M. Jot, Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement, in Proc. ICASSP, Hawaii, 2007, pp.9-12.

  5. Shifted PCA for primary extraction To account for the partial primary correlation (0-lag) caused by the inter-channel time difference (ICTD) . ' 0 ' 0 p x x 1 p 1 x 0 p 1 x 0 1 p Shifted signal Shifted primary Stereo input signal Primary components ICTD estimation Time shifting Output mapping PCA decomposition 1 k ( ) n ( ) ( ) ( ) n ( ) ( ) = + = + + , SPCA,0 p x n kx n SPCA,1 p x n kx n 0 1 0 1 + + 2 2 1 1 k k J. He, E. L. Tan, and W. S. Gan, Time-shifted principal component analysis based cue extraction for stereo audio signals, in Proc. ICASSP, Vancouver, Canada, 2013, pp. 266-270. 5

  6. Multi-Shift PCA for primary extraction To account for concurrent directional sound sources (from different directions) in the primary components, we consider a few selective shifts. Stereo input signal Primary components X P 1 P 1 Time shifting X 1 PCA Typical structure of MSPCA (MSPCA-T) Multiple ICTD estimation iP X i Time shifting Output mapping i PCA T P X Time shifting T T PCA = , , 1, , i T i = X x x , ; 0 1 X th estimated ICTD ( : total number of ICTDs) shifted signal : : , final output of the extracted primary compone : = P p p : T i i P extracte d shifted primary components i nts 6 0 1

  7. Multi-Shift PCA: consecutive structure Consecutive shifting lag by lag, and apply different weights to different shifted versions. The weights are derived based on inter-channel cross- correlation coefficient (ICC). Stereo input signal Primary components X P 1 P 1 Time shifting X 1 PCA Multiple ICTD estimation iP X i Time shifting Output mapping i PCA T P X Time shifting T T PCA = + + 2,..., 1,0,1,..., , 1, 1, L L L L L L = a l a l where , L w ( ) ( ) = + l , P n wP n l = l L l l = l L : the exponent applied on the ICC a 7

  8. Experiment setup Primary components: = = speech: 3, 1, 3 20 = k 1 = 1 20 k music: 2 1 Ambient components: uncorrelated white Gaussian noise; Overall power of speech, music and ambience are set equal; Approaches evaluated: PCA SPCA MSPCA-T MSPCA (a=2) MSPCA (a=10) ICTD searching range: 50 lags, (~2ms for fs=44.1 kHz) 8

  9. Comparison of weighting methods An example of weighting method 1 PCA and SPCA: only one nonzero weight at different lags; (a) PCA 0.5 wl 0 MSPCA-T: two weights at two lags, though the positive ICTD for the music is not as accurate; -50 1 -40 -30 -20 -10 0 10 20 30 40 50 (b) SPCA 0.5 wl 0 For consecutive MSPCA, non-zero weights at all lags, and higher weights are given to those lags that are closer to the directions of the primary components; -50 1 -40 -30 -20 -10 0 10 20 30 40 50 (c) MSPCA-T 0.5 wl 0 -50 -40 -30 -20 -10 0 10 20 30 40 50 0.04 (d) MSPCA(a=2) As the exponent a increases, the differences among the weights become more significant; 0.02 wl 0 -50 -40 -30 -20 -10 0 10 20 30 40 50 0.4 (e) MSPCA(a=10) When a is high (e.g., a = 10), the weighting method in consecutive MSPCA becomes similar to SPCA. 0.2 wl 0 -50 -40 -30 -20 -10 0 10 20 30 40 50 Lag l 9

  10. Objective performance: extraction accuracy Error-to-signal ratio 2 2 p p p p 2 . 0 0 1 1 = + ESR(dB) 10log 2 2 10 2 2 p p 0 1 2 2 10

  11. Subjective performance: localization accuracy 12 participants, score from 0-10 0-2 : two directions almost reversed; 2-4: neither directions are close; 4-6 : neither directions are close nor too far; 6-8 : at least one direction is close; 8-10: both directions are close to reference; Subjective score on localization accuracy 10 9 8 7 6 Score 5 4 3 2 1 11 PCA SPCA MSPCA-T MSPCA(a=2) MSPCA(a=10)

  12. Conclusions 1. Investigated primary extraction from stereo signals when there are multiple concurrent distinct directions for the sources in the primary components. 2. Proposed multi-shift PCA to handle multiple directions a) MSPCA with typical structure involves limited selected shifts, but its performance is degraded when ICTD estimation is inaccurate; b) MSPCA with consecutive structure is more robust, by applying weights on every shifted versions. c) The weighting method for different shifts is critical; d) In general, applying a proper exponent of the ICC yields good (objective and subjective) performance. 3. Future work: determine the best exponent value for ICC based weighting, other weighting methods, and relate multi-shifting with optimal filtering in PAE. 12

  13. References [1] M. M. Goodwin and J. M. Jot, Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement, in Proc. ICASSP, Hawaii, 2007, pp.9-12. [7] C. Faller and F. Baumgarte, Binaural cue coding-part II: schemes and applications, IEEE Trans. Speech Audio Process., vol. 11, no. 6, pp.520-531, Nov. 2003. [8] M. M. Goodwin and J. M. Jot, Binaural 3-D audio rendering based on spatial audio scene coding, in Proc. 123rd Audio Eng. Soc. Conv., New York, 2007. [12] K. Sunder, J. He, E. L. Tan, and W. S. Gan, Natural sound rendering for headphones, IEEE Signal Processing Magazine, vol. 32, no.2, pp. 100-113, Mar. 2015. [13] C. Avendano and J. M. Jot, A frequency-domain approach to multichannel upmix, J. Audio Eng. Soc., vol. 52, no. 7/8, pp. 740-749, Jul./Aug. 2004. [14] C. Faller, Multiple-loudspeaker playback of stereo signals, J. Audio Eng. Soc., vol. 54, no. 11, pp. 1051-1064, Nov. 2006. [17] J. He, W. S. Gan, and E. L. Tan, Primary-ambient extraction using ambient phase estimation with a sparsity constraint, IEEE Signal Process. Letters, vol. 22, no. 8, pp. 1127-1131, Aug. 2015. [18] J. Merimaa, M. M. Goodwin, and J. M. Jot, Correlation-based ambience extraction from stereo recordings, in Proc. 123rd Audio Eng. Soc. Conv., New York, 2007. [21] J. He, E. L. Tan, and W. S. Gan, Time-shifted principal component analysis based cue extraction for stereo audio signals, in Proc. ICASSP, Vancouver, Canada, 2013, pp. 266-270. [22] J. He, E. L. Tan and W. S. Gan, Linear estimation based primary-ambient extraction for stereo audio signals, IEEE Trans. Audio, Speech, Lang. Process., vol. 22, no. 2, pp. 505-517, Feb. 2014. [24] J. He, W. S. Gan and E. L. Tan, A study on the frequency-domain primary-ambient extraction for stereo audio signals, in Proc. ICASSP, Florence, Italy, 2014, pp. 2868-2872. 13

  14. Acknowledgement THIS WORK IS SUPPORTED BY THE SINGAPORE MINISTRY OF EDUCATION ACADEMIC RESEARCH FUND TIER-2, UNDER RESEARCH GRANT MOE2010-T2-2-040. 14

  15. Multi-Shift Principal Component Analysis based Primary Component Extraction for Spatial Audio Reproduction Thank you! Jianjun HE, Woon-Seng Gan jhe007@e.ntu.edu.sg, ewsgan@ntu.edu.sg Digital Signal Processing Lab, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore

Related


More Related Content