University of WarwickYu Song's Home  

Refiltering


Investigator: Yu Song       Supervisor: Dr Nasir Rajpoot



Situation

Given ONLY ONE microphone (i.e. one input) containing speech recordings from two people, could we "separate" each person's speech?



The "Impossible Mission"(again)

Given ONLY ONE equation,

Y = X1 + X2

and we only know Y, can we calculate X1 and X2?


Secrets in spectrograms

If we have a look at the log spectrogram of the "mixture" (i.e. recordings from the ONLY ONE microphone) and the element-wise maximum of log spectrograms of each "source" (i.e. speech from each person),

The Log-Max Approximation
Figure 1      The Log-Max Approximation

Full Screen View

From Top to Bottom, Log spectrogram of mixture, Element-wise maximum of log spectrograms of sources, and differences between two plots


these spectrograms are almost the same!

The idea of Refiltering is to use "masking signals" to isolate single sources from the mixture and reconstruct signals given some "prior information" (or "Training").



Training

Given some prior information, it is possible to predict elements in log spectrograms. This is very similar to mathematical principles used in many speech recognition systems. If you want to know more about mathematics and probability, please see papers in the reference section below.


Demo

Due to limited disk quotas (department system admin has already kindly increased mine once), sound files in wave format are not available at the moment. They will be on-line when I finish writing up my report.

For now, please see the following log spectrogram plots.



1. Log spectrogram of the mixture
Mixture
Figure 2      Mixture


2. Log spectrogram of the estimation 1
Estimation 1
Figure 3      Estimation 1


3. Log spectrogram of the estimation 2
Estimation 2
Figure 4      Estimation 2


For comparisons, here are clean recordings from Source 1 and Source 2.


To save you some time doing "image analysis" on sources and estimations, here are differences between them:

Differences between source 1 and estimation 1

Differences between source 2 and estimation 2



References

M.J. Reyes-Gomez, D. Ellis, N. Jojic (2004), Multiband Audio Modeling for Single Channel Acoustic Source Separation, Proceedings of ICASSP-04, Montreal, May 2004

S.T. Roweis (2003), Factorial Models and Refiltering for Speech Separation and Denoising, Proceedings of Eurospeech 2003, Geneva, pp.1009-1012

S.T. Roweis (2002), One Microphone Source Separation, Neural Information Processing Systems 13, pp.793-799

More references are available in Selected Literature




Back To Projects

Last modified by Yu Song :   Thu Sep 29 21:31:27 BST 2005


Copyright © Yu Song, 2004+
All Rights Reserved