| University of Warwick | Yu Song's Home |
| SituationGiven ONLY ONE microphone (i.e. one input) containing speech recordings from two people, could we "separate" each person's speech? The "Impossible Mission"(again)Given ONLY ONE equation, Secrets in spectrogramsIf we have a look at the log spectrogram of the "mixture" (i.e. recordings from the ONLY ONE microphone) and the element-wise maximum of log spectrograms of each "source" (i.e. speech from each person), ![]() Figure 1 The Log-Max Approximation From Top to Bottom, Log spectrogram of mixture, Element-wise maximum of log spectrograms of sources, and differences between two plots these spectrograms are almost the same! The idea of Refiltering is to use "masking signals" to isolate single sources from the mixture and reconstruct signals given some "prior information" (or "Training"). TrainingGiven some prior information, it is possible to predict elements in log spectrograms. This is very similar to mathematical principles used in many speech recognition systems. If you want to know more about mathematics and probability, please see papers in the reference section below. DemoDue to limited disk quotas (department system admin has already kindly increased mine once), sound files in wave format are not available at the moment. They will be on-line when I finish writing up my report. For now, please see the following log spectrogram plots. 1. Log spectrogram of the mixture ![]() Figure 2 Mixture 2. Log spectrogram of the estimation 1 ![]() Figure 3 Estimation 1 3. Log spectrogram of the estimation 2 ![]() Figure 4 Estimation 2 For comparisons, here are clean recordings from Source 1 and Source 2. To save you some time doing "image analysis" on sources and estimations, here are differences between them: ReferencesM.J. Reyes-Gomez, D. Ellis, N. Jojic (2004), Multiband Audio Modeling for Single Channel Acoustic Source Separation, Proceedings of ICASSP-04, Montreal, May 2004 S.T. Roweis (2003), Factorial Models and Refiltering for Speech Separation and Denoising, Proceedings of Eurospeech 2003, Geneva, pp.1009-1012 S.T. Roweis (2002), One Microphone Source Separation, Neural Information Processing Systems 13, pp.793-799 More references are available in Selected Literature Back To Projects Last modified by Yu Song : Thu Sep 29 21:31:27 BST 2005 | ![]() |