sparse nonnegative matrix based on -divergence for single channel separation in cochleagram

Upload: tjprc-publications

Post on 04-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    1/14

    International Journal of Mathematics and Computer

    Applications Research (IJMCAR)

    ISSN 2249-6955Vol. 2 Issue 4 Dec 2012 11-24

    TJPRC Pvt. Ltd.,

    SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR

    SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    M. E. ABD EL AZIZ & WAEL KIDER

    Department of Mathematics, Faculty of Science, Zagazig University, Zagazig 44519, Egypt

    ABSTRACT

    In this paper, a novel family of -divergence based two-dimensional nonnegative matrix factorization methods to

    solve SCBSS has been proposed. The separation system of cochleagram and the family of divergence based

    factorization algorithms have been developed in a principled manner coupled with the theoretical support of audio signal

    separability. The proposed method enjoys at least two significant advantages: Firstly, the cochleagram rendered by the

    gammatone filterbank has non-uniform time-frequency resolution which enables the mixed signal to be more separable and

    improves the efficiency in source tracking. Secondly, the divergency holds a desirable property of scale invariant that

    enables low energy components in the cochleagram bear the same relative importance as the high energy ones. We

    compare our system to the Factorial SC and SNMF2D models, where the proposed algorithm shows a superior

    performance in terms of signal-to interference ratio. Finally, the low computational requirements of the algorithm allows

    close to real time applications.

    KEYWORDS: Blind Signal Separation (BSS), Nonnegative Matrix Factorization (NMF), Divergence, - NMF,

    Single Channel Source Separation (SCSS)

    INTRODUCTION

    Single channel source separation (SCSS) aims to extract several source signals from a single mixture recording.

    Since at least two sources are interfering and sound sources may overlap in time so that the standard source separation

    methods such as ICA (Hyvarinen et al 2001) cannot be applied, the standard NMF or SNMF models (Schmidt et al 2006)

    are only satisfactory for solving source separation providing that spectral frequencies do not change over time. The

    recently SNMF2D model (Gao et al 2011) solving the problem of SNMF where the spectral dictionary and temporal code

    optimized by using kullback divergence, where they rarely interfere in a time-frequency representation. This fact has been

    used in computational auditory scene analysis (Wang et al 2006, Brown 1994); inspired by the human ability to organize

    the perceived time-frequency representation according to likely sources, but SNMF2D has some drawbacks that originate

    from its lack of generalized criterion for controlling the sparsity. Roweis (Roweis 2003) introduced the refiltering

    framework which uses so-called spectrogram masks in order to attenuate spectrogram parts which do not belong to the

    desired sources. To estimate these mask signals, he proposed the factorial-max vector quantizer (VQ) model, which

    assumes that the magnitude-log source spectrograms are generated by vector quantizers plus a noise term. In order to train

    speaker specific code-books and to estimate the noise variances he applied k-means to source specific spectrograms.

    Hence, max-VQ explicitly models the sources in a training stage. The factorial-max VQ model can be extend by replacing

    the vector quantizers with sparse coders (Peharz 2010). A sparse coder can be seen as a generalization of a vector

    quantizer, since it represents data with a linear combination of up to so-called atoms ( being a parameter to chose),

    while a vector quantizer uses a single, non-scalable code-word, consequently. In order to train speaker specific dictionaries,

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    2/14

    12 M.E. Abd El Aziz & Wael Kider

    it use a non-negative matrix factorization algorithm with -sparseness constraints on the coefficient matrix (NMF ).Thesparse coder model suffer from some drawbacks such as it affected by outlier and noise since it using Euclidean distance

    and also it using STFT that will produce errors especially when complicated transient phenomena such as the mixing of

    speech and music occur in the analysed signal.

    The aim of this work is to remedy these drawbacks so we formulate a single channel NMF model that accounts for

    convolutive mixing and can see as generalization for (Peharz 2010) in which it using -NMF algorithm where it is robust

    with respect to noise and/or outliers in single channel convolution. The source cochleagram spectrograms are modeled

    through NMF and the mixing filters serve to identify the elementary components pertaining to each source.

    The remaining of this paper is organized as follows. Single channel NMF model is introduced in section 2.

    Section 3 is devoted to the Factorial Sparse Coder algorithm. In section 4 the definition of -divergence. Section 5

    presents the estimation of spectral basis and temporal code. Section 6 presents the results of our algorithm to source

    separation in various settings. Conclusions are drawn in section 7.

    SINGLE CHANNEL NMF MODEL

    We consider sampled signal generated as 2 convolutive noisy mixtures of point source signals such that

    1

    Where is additive noise. The time-domain mixing given by (1) can be approximated in the short-time Fouriertransform (STFT) domain as:

    2

    where and are the complex-valued STFTs of the corresponding time signals, 1 , , is a frequency binindex, 1 , , is a time frame index. Equation (2) can be rewritten in matrix form: (3)We used NMF to model the power spectrogram | | | ,| of source j as a product of two nonnegativematrices and , such that

    (4)The 3D-representation for matrices and presented in Figure 1.

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    3/14

    Sparse Nonnegative Matrix Based on -Divergence for Single Channel Separation in Cochleagram 13

    Figure1: (A) Frontal Slice 3D-Representation (B) Vertical and Horizontal Slice 3D Representation

    FACTORIAL SPARSE CODER ALGORITHM

    In this section we illustrated the Factorial Sparse Coder Model (Factorial SC) (Peharz 2010) where it using

    method similar to K-SVD algorithm for dictionary training for sparse coders consist of two stages. For the sparse coding

    stage it proposed non-negative matching pursuit (NMP), a non-negative variant of OMP. In the dictionary update step it

    use several iterations of nonnegative matrix factorization (NMF) proposed by Lee and Seung (Lee et al 2001) , the

    Factorial Sparse Coder Model reformulation the equation (4) as

    where is a source specific dictionary, is the corresponding coefficient vector and is an index vectorindicating the selected atoms. The summarized of Factorial SC algorithm can found in algorithm 1.

    Where a solution is defined as a triplet , , where contains the indices of the selected atoms out of, are the corresponding coefficients and is the residual. The set of all solutions is denoted as . Starting with a single

    trivial solution , , , in every iteration each solution is extended with up to atoms, selected by the functionselectBestAtoms. In selectBestAtoms, it calculate . Atoms with negative values in , and atoms which wouldmake the prior probability to zero, are discarded, where the prior probabilities are calculated according to the original

    dictionaries and as.

    , , |

    5

    where the factors and | can be estimated from the coefficient matrix returned by NMF (Peharz2010).When is the number of remaining atoms, , atoms with largest values in are selected. The innerproducts and the indices of the selected atoms are returned in the vectors and . In lines 10-12, we perform NMF forthe coefficient vector , which approximate equation (6).

    arg min , : 0 6

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    4/14

    14 M.E. Abd El Aziz & Wael Kider

    Continuing in this manner, the solution set comprises up to solutions in iteration. After 1 iterations, it startto prune the solution set to the best solutions in every iteration, i.e. it select the solutions with highest posterior (7),where the probabilities and are evaluated according to the original dictionaries.

    , | |

    ,

    ,

    ,

    7The Laplacian form factors can be estimated from the residual error in the training stage. When the algorithmhas stopped, it select the solution with maximal posterior out of the final solution set and build the coefficient matrix ,which is split according to the original dictionaries: . The approximations of the source spectrograms are thengiven as . It calculate a mask for each source according to , 1,2. Finally,approximations of the source signals are given by the inverse short term fourier transform (ISTFT) of the masked mixture:

    , where is the original complex mixture spectrogram.Algorithm 1: Factorial SC

    1. , , 2. for l=1:L3. 4. for5. , , 6. , , , , 7. for b=1: | |8. , 9. , 10. for j = 1 : J

    11. 12. endfor

    13. 14. , , 15. endfor

    16. endfor

    17. 18. if then19. Prune to the best solutions20. endif

    21. endfor

    Since this algorithm work in STFT it has some drawbacks such as the classical spectrogram as computed by the

    STFT has an equal-spaced bandwidth across all frequency channels. Since speech signals are characterized as highly non-

    stationary and non-periodic whereas music changes continuously; therefore, application of the Fourier transform will

    produce errors especially when complicated transient phenomena such as the mixing of speech and music occur in the

    analysed signal. Unlike the spectrogram, the log-frequency spectrogram possesses non-uniform TF resolution. However, it

    does not exactly match to the nonlinear resolution of the cochlear since their centre frequencies are distributed

    logarithmically along the frequency axis and all filters have constant-Q factor (Brown 1991).

    On the other hand, the gammatone filters used in the cochlear model are approximately logarithmically spaced

    with constant-Q for frequencies from /10 to /2 and approximately linearly spaced for frequencies below /10 .Hence, this characteristic results in selective non-uniform resolution in the TF representation of the analysed audio signal.

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    5/14

    Sparse Nonnegative Matrix Based on -Divergence for Single Channel Separation in Cochleagram 15

    Gammatone filterbank was previously proposed in (Hu et al 2007, Jin et al 2009) as a model to cochlear filtering which

    decomposes the time-domain input into the frequency domain. The impulse response of a gammatone filter centered at

    frequency is given by:

    ,

    , 00 , 8

    where denotes the order of filter, represents the rectangular bandwidth which increases as the center frequency

    increases. With regards to a particular filter channel , let be the center frequency. Then, the filter output response , can be expressed as: , , 9

    where represents convolution. The response is shifted backwards by 1/2 to compensate for thefilter delay. The output of each filter channel is divided into time frame with 50% overlap between consecutive frames (Hu

    et al 2005). The resulting outputs form the time-frequency spectra which are then constructed to form the cochleagram.

    The use of the gammatone filter is consistent according to the neurobiological modeling perspective. Figure 2 shows an

    example of frequency response for different types transform.

    So by work in cochleagram spectrum we solve the problem of STFT, in the next section the -divergence introduced to

    solve the problem of outliers/noise that produced by using Euclidean distance.

    Figure 2: Different Types Transform (A) Original Source (B) Cochleagram (C) Spectrum (D) Log-Spectrum

    -DIVERGENCE

    The -divergence (Cichocki et al 2011) can be defined as :

    | 1 , 1 10

    This divergence can be by suitable choice of the (, ) parameters simplifies into some existing divergences,

    including the well-known Alpha- and Beta-divergences. For example when 1the -divergence reduces to the

    Alpha-divergence (Cichocki et al 2009)::

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    6/14

    16 M.E. Abd El Aziz & Wael Kider

    , | 1, 111 1 1

    0, 1

    On the other hand, when 1, it reduces to the Beta-divergence (Cichocki et al 2010):

    , |

    1 1,121 11 1 , 1 0

    Also -divergence reduces to the standard Itakura-Saito divergence for 1 and 1 (Lee et al 2001).,

    |

    1 13

    We used -divergence for many reasons that found in Cichocki et al 2011), in which it illustrated the role of the

    hyper-parameters and

    on the robustness of the

    -divergence with respect to errors and noises, and it compare the

    behavior of the -divergence with the standard Kullback-Liebler divergence, also by scaling arguments of the -

    divergence by a positive scaling factor 0, it yields the following relation, | | 14

    These basic properties imply that whenever 0, we can rewrite the -divergence in terms of a

    -order Beta-

    divergence combined with an -zoom of its arguments as

    , | | 15Estimation of the Spectral Basis and Temporal Code

    In order to use -divergence so our objective function is:

    || 16Where is the structure defined by :

    17Let be a scalar parameter of the set , . The derivative of w.r.t :

    D | | 18Where || is the derivative of || w.r.t. given by

    || 19

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    7/14

    Sparse Nonnegative Matrix Based on -Divergence for Single Channel Separation in Cochleagram 17

    The gradient of the -divergence can be expressed in a compact form (for any , ) in terms of a 1 deformed logarithm .By using (18), we obtain the following derivatives:

    D | ||

    1 . ..

    20

    D | ||

    1 . ..

    21

    The previous equations can be written in the following matrix form:

    D | 1 22D | 1 23

    So the update rule for both and in matrix form are . . .. 24

    .

    .

    .

    .

    25

    In finally we conclude our algorithm (in which we called -FSC) as follow

    Algorithm 2: -FSC

    Input :

    Output: , 1. cochleagram 2. , -NM F (

    2.

    Estimate and | from coefficient matrix.3. , Factorial SC( , , , | % replace line 11 by . . ..

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    8/14

    18 M.E. Abd El Aziz & Wael Kider

    Algorithm3 : -NMF 1: Initialize randomly

    2: for 1: do3: sparsely code with using -NMP

    4: for 1: do5: . .

    .

    .

    6: || || , k 1 , , K7: . . ..8: end for

    9: end for

    Algorithm 4: -NMP

    1: 2: 3: 4: for

    1:do

    5: 6: 7: 8: if 0then9: Terminate

    10: end if

    11: , 12: , 13: for 1: do14: . . ..15: end for

    16: 17: end for

    RESULTS AND ANALYSIS

    Experiment Setup

    The proposed method is tested by separating music and speech sources. Several experimental simulations under

    different conditions have been designed to investigate the efficacy of the proposed method. MATLAB is used as the

    programming platform. For mixture generation, two speakers' male and female were selected from TIMIT speech database

    (www.ldc.upenn.edu/Catalog/LDC93S1.html.) and the music signals are selected from the RWC database

    (http://staff.aist.go.jp). Some mixtures are sampled at 16 kHz sampling rate and other at 8 kHz. We compare our algorithm

    -FSC with MMSS (Li et al 2009), SNMF2D and Factorial SC algorithms. Where the TF representation for Factorial SC

    and MMSS is computed by normalizing the time-domain signal to unit power and computing the STFT using 1024 point

    Hamming window FFT with 50% overlap. For SNMF2D the frequency axis of the obtained spectrogram is then

    logarithmically scaled and grouped into 175 frequency bins in the range of 50 Hz to 8 kHz with 24 bins per octave. For

    -FSC the cochleagram based on Gammatone filterbank of 128 channels (filter order of 4) and the output is divided into

    20-ms time frame with 50% overlap between consecutive frames. In all cases, the sources are mixed with equal average

    power over the duration of the signals. Two types of mixtures are used: mixture of music and speech; mixture of different

    kinds of music.

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    9/14

    Sparse Nonnegative Matrix Based on -Divergence for Single Channel Separation in Cochleagram 19

    Measure of Performance

    We have evaluated our separation performance in terms of the signal-to-distortion ratio (SDR) which is one form

    of perceptual measure. This is a global measure that unifies source-to-interference ratio (SIR), source-to-artifacts ratio

    (SAR), and source-to-noise ratio (SNR). MATLAB routines for computing these criteria are obtained from the SiSEC08

    webpage (Vincent et al 2008, Vincent et al 2005).

    Analysis of Results

    Figure 3 shows the time domain of the original speech of male, female and the mixture of two sources; Figure 4

    show the Cochleagram of two sources and its mixture .Figure 5 further shows the separation results in the cochleagram.

    The plot clearly shows the spectral energy of the two audio sources is clustered at different frequencies in the cochleagram

    due to their different fundamental frequencies. These prominent features have been separated using the proposed -FSC

    algorithm. Figure 6 shows the final recovered time-domain sources.

    To further analyses the performance of all the above matrix factorization methods in separating the mixed signal

    and capturing the TF patterns of the sources, the cochleagram of the each recovered source has been plotted in Figure 5. In

    Figure 5, panels (a)-(b), (c)-(d) , (e)-(f) and (g)-(h) denote the recovered cochleagram of the female speech and male by

    using the Factorial SC, MMSS, SNMF2D and-FSC algorithms, respectively. In particular, panels (c)-(d) implies thatMMSS algorithm cannot obtain better reconstruction of the sources. SNMF2D give better estimation than MMSS. On the

    other hand, it is noted that both Factorial SC and -FSC algorithms exhibit good reconstruction of the female speech as

    well as the male. However, the Factorial SC algorithm fails to identify several missing components as indicated in the red

    box marked area of panel (a)-(b). Hence, less accuracy is obtained in the estimation of the male as compared with the -

    FSC algorithm which has successfully estimated both sources with high accuracy.

    Table 1 shows the comparison of the proposed algorithm (-FSC) based on the cochleagram with other

    algorithms such as MMSS, SNMF2D and Factoral SC. It is noted that MMSS give poor results and SNMF2D is better than

    MMSS but less than others algorithms .Where both Factoral-SC and -FSC algorithms exhibit a good reconstruction in

    terms of SDR, SIR and SAR. However, the resulting factorizations are not equivalent.

    The major reason for the large discrepancy between them is the resulting spectrogram fails to infer the dominating

    source. This leads to high degree of ambiguity in TF domain and causes lack of uniqueness in extracting the spectral-

    temporal features of the sources. The cochleagram enables the mixed signal to be more separable and thereby reduces the

    mixing ambiguity between |S|and |S|. This explains the performance of separating mixture music and female utteranceis highest among all the mixtures because both sources have very distinguishable TF patterns in the cochleagram.

    In summary, all the results in Table 1 and Figures 5 unanimously show the importance of using the (-FSC)

    factorization algorithm in order to correctly estimate the spectral and temporal features of each source.

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    10/14

    20 M.E. Abd El Aziz & Wael Kider

    Figure3 :(A) Original Female speech (B) Original Male speech (C) Mixture of Sources

    Figure4 :(A) Cochleagram of Original Female speech (B)Cochleagram of Original Male speech (C) Cochleagram of

    Mixture

    Table1: Comparison between -FSC,Factoral SC, SNMF2D and MMSS

    Mixture Algorithm

    SDR SAR SIR

    S1 S2 S1 S2 S1 S2

    Female1 Speech and male speech -FSC 12.7711 12.4117 13.9310 13.9303 19.2441 17.8848

    Factoral SC 12.6270 12.2991 13.8221 13.8214 18.9913 17.7675

    SNMF2D -17.444 16.783 -17.335 42.0723 16.0476 16.7971

    MMSS 3.7309 7.5410 4.9557 8.0659 11.0300 17.6080

    Female1 Speech and Female speech -FSC 13.7165 13.6159 14.7072 14.7966 21.8654 19.9908

    Factoral SC 12.8072 11.9564 13.4305 13.8704 20.7392 16.6114

    SNMF2D -19.962 11.9852 9.2016 12.0191 -19.464 33.3438

    MMSS 5.1450 5.3637 6.3621 6.7431 13.1413 11.0951

    Music and music -FSC 17.4555 17.9902 18.3730 18.9706 24.7368 23.6153

    Factoral SC 16.4991 17.3420 17.3343 17.6024 23.1495 21.3892

    SNMF2D -24.925 18.2986 15.5392 18.3553 -24.805 37.2287

    MMSS 10.7776 -8.1304 -4.8884 11.0298 23.5920 0.7689

    Music and Female -FSC 14.1972 15.3901 15.1863 15.9172 21.2375 19.8083Factoral SC 14.3782 13.6053 14.4900 14.1660 20.9607 18.9371

    SNMF2D -17.5836 8.8261 9.7609 8.8630 -17.1394 30.0781

    MMSS 6.5059 9.3063 7.3610 9.3634 14.7163 28.6164

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    11/14

    Sparse Nonnegative Matrix Based on -Divergence for Single Channel Separation in Cochleagram 21

    Figure5: Separation Results: (a)-(b), (c)-(d) , (e)-(f) and (g)-(h) Denote the Recovered Female speech and Male in the

    Cochleagram by using the Factorial SC,MMSS,SNMF2D , -FSC Algorithms, Respectively

    Figure 6 :Time Separation of Source ,(a)-(b) Factorial SC. (c)-(d) MMSS. (e)- (f) SNMF2D. (g)-(h) -FSC

    CONCLUSIONS

    In this paper we proposed a separation framework using the gammatone filterbank. That produces a non-uniform

    TF domain termed as the cochleagram whereby each TF unit has different resolution unlike the classical spectrogram

    which deals only with uniform resolution. Towards this end, it is shown that the mixed signal is significantly more

    separable in the cochleagram than the classic spectrogram and the log-frequency spectrogram (constant-Q transform).

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    12/14

    22 M.E. Abd El Aziz & Wael Kider

    Also a family of -divergence based novel two-dimensional nonnegative matrix factorization algorithms has been

    developed to extract the spectral and temporal features of the sources. The proposed factorizations are scale invariant

    whereby the lower energy components in the cochleagram can be treated with equal importance as the higher energy

    components. Within the context of SCBSS, this property is highly desirable as it enables the spectral-temporal features of

    the sources that are usually characterized by large dynamic range of energy to be estimated with significantly higher

    accuracy. This is to be contrasted with the matrix factorization based on LS distance and KL divergence where both

    methods favor the high-energy components but neglect the low-energy components.

    In the comparison of FSC and NMF2D algorithms, the proposed FSC obtains the best separation performance.

    The impetus behind this work is that, sparseness achieved by the conventional NMF, SNMF, NMF2D and SNMF2D is not

    efficient enough; in source separation it is very necessary to yield control over the degree of sparseness explicitly for each

    temporal code.

    REFERENCES

    1. Brown, C. (1991).Calculation of a constant Q spectral transform, J. Acoust. Soc. Am., vol. 89, no 1, pp. 425434.2. Brown, C. and Cooke, M.(1994). Computational auditory scene analysis, Computer Speech and Language, vol.8,

    pp. 297336.

    3. Cichocki, A. , Zdunek, R. ,and Phan, A.H. (2009). Nonnegative Matrix and Tensor Factorizations, John Wiley &Sons Ltd.: Chichester, UK.

    4. Cichocki, A. , Sergio , C. and Amari, S. (2011). Generalized Alpha-Beta Divergences and Their Application toRobust Nonnegative Matrix Factorization, Entropy, 13, 134-170.

    5. Cichocki, A. and Amari, S. (2010). Families of Alpha- Beta- and Gamma- divergences: Flexible and robustmeasures of similarities, Entropy, 12, pp. 15321568.

    6. Gao, B. , Woo, W. L. and Dlay, S. S. (2011). Single channel source separation using EMD-subband variableregularized sparse features, IEEE Trans .Audio, Speech, Lang. Process., vol. 19, no. 4, pp. 961976.

    7. http://staff.aist.go.jp8. Hu, G. and Wang, D. L. (2007). Auditory segmentation based on onset andoffset analysis, IEEE Trans. Audio,

    Speech and Language Processing, vol. 15, no. 2, pp. 396405.

    9. Hu, G. and Wang,.( 2004).Monaural speech segregation based on pitch tracking and amplitude modulation, IEEETrans. Neural Networks, vol. 15, no. 5, pp. 11351150.

    10. Hyvarinen, A., Karhunen,J. and Oja, W. (2001). Independent Component Analysis. John Wiley & Sons.11. Jin, Z. and Wang, D.L (2009). A supervised learning approach to monaural segregation of reverberant speech,

    IEEE Trans. on Audio, Speech and Language Processing, vol. 17, pp.625-638.

    12. Lee, D. D. and Seung, H. S.(2001). Algorithms for non-negative matrix factorization, Advances in neuralinformation processing systems, vol. 13, pp. 556562.

    13. Li ,Y., Woodruff, J. and D.L Wang .(2009). Monaural musical sound separation based on pitch and commonamplitude modulation. IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, pp. 1361-

    1371.

    14. Peharz, R. (2010).Single channel source separation using dictionary design methods for sparse coders, Mastersthesis, Graz University of Technology.

    15. Roweis. S. (2003). Factorial models and refiltering for speech separation and denoising, in EUROSPEECH, pp.10091012.

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    13/14

    Sparse Nonnegative Matrix Based on -Divergence for Single Channel Separation in Cochleagram 23

    16. Schmidt, M. N. and Morup, M.(2006). Nonnegative matrix factor 2-D deconvolution for blind single channelsource separation, in Proc.Int. Conf. Ind. Compon. Anal. Blind Signal Separat. (ICABSS06), Charleston, SC,

    vol. 3889, pp. 700707.

    17. Vincent, E. , Araki ,S .(2008).Signal Separation Evaluation Campaign (SiSEC 2008). [Online]. Available:http://sisec.wiki.irisa.fr.

    18. Vincent, E. , Gribonval, R. and Fevotte, C.(2005). Performance measurement in blind audio source separation,IEEE Trans. on Audio, Speech, and Language Processing. vol. 14, no. 4, pp. 14621469, Jul.

    19. www.ldc.upenn.edu/Catalog/LDC93S1.html.20. Wang, D.and Brown, G. J. ( 2006). Computational Auditory Scene Analysis: Principles, Algorithms,and

    Applications, ser. IEEE Press. J. Wiley and Sons Ltd.

  • 7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM

    14/14