An Adaptive Method for Audio Denoising Issue

# An Adaptive Method for Audio Denoising Issue ###### tags: `11th-joint-workshop` * [Book Mode](https://hackmd.io/@J_Dcp_llQXeepngHevDw3A/HyXK15RWI/https%3A%2F%2Fhackmd.mcl.math.ncu.edu.tw%2Fs%2FNqHMpR1Z4) ## Overview > ### Optimization problems in signal and audio denoising > Total variation based filtering, derived by Rudin, Osher, and Fatemi at first in 1992, has nowadays been widely applied in signal denoising and further in image denoising problem, such as signal smoothing, fingerprint enhancement, and so on. This filter (in 1-dimension, for instance) can be implemented through several recursive (or non-recursive) approaches so as to reduce the objective function, which is given by ... > > However in audio, a special kind of signals, does not play a good result based on the above approach when solving the denoising problem. We will focus on the main idea of the total variation filtering in the beginning of this talk, as well as try out to revise the model by taking frequency into consideration, followed by accessing the particular signal denoising problem as our future work. > [name=10th Joint Workshop] [time=March, 2019] > ### Weighted Instantaneous Phase Corrected Total Variation with Application to Audio Denoising Issue > *"Spectrum-Based Total Variation Denoising Approach"* > > Total variation based filtering, derived by Rudin, Osher, and Fatemi at first in 1992 [5], has nowadays been widely applied in signal denoising and further in image denoising problem [6, 3], such as signal smoothing, fingerprint enhancement, and so on. However in audio, a special kind of signals, does not play a good result based on the above approach when solving the denoising problem. > > Through the past 10 years, there are several methods which tend to map the original signal to another feature space so as to measure the level of noise interference. In particular, there are two approaches which are based on the total variation denoising model [1, 7]. We will discuss them in the following sections and try out to propose an additional weighting concept upon them with further conditions. > [name=Poster- NCU X HU -Symposium] [time=June, 2019] ### Key points * Concept of instantaneous phase corrected total variation. * In-depth study of the iterative algorithm (primal-dual splitting method). * Applying YAMNet with corresponding weighting factor to iPCTV. * Study of criterias for quality judgment. ### Abstract Phase corrected total variation and its improvement [1, 2] have been proposed in the past few years to reduce noise in general audio signals. In this talk we will review on the concept of instantaneous phase corrected total variation (iPCTV) [2], involving an in-depth study of the primal-dual splitting algorithm [3]. With usage of YAMNet [4, 5], a deep neural network classification model combining spectrogram-based feature extraction and depthwise separable convolution [6], an adaptive method is developed to detect classes of an audio signal, followed by reducing the noise by a revised model of iPCTV with corresponding weighting factor. ### References 1. I. Bayram and M. E. Kamasak, “[A simple prior for audio signals](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6457419), ” IEEE Trans. Audio, Speech, Language Process, vol. 21, no. 6, pp. 1190–1200, 2013. 2. K. Yatabe and Y. Oikawa, “[Phase corrected total variation for audio signals](http://150.162.46.34:8080/icassp2018/ICASSP18_USB/pdfs/0000656.pdf),” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2018-April, pp. 656–660, 2018. 3. L. Condat, “[A primal–dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms](https://hal.archives-ouvertes.fr/hal-00609728v5/document),” Journal of Optimization Theory and Applications, vol. 158, 08 2013. 4. Manoj Plakal and Dan Ellis, “YAMNet.” https://github.com/tensorflow/models/tree/master/research/audioset/yamnet, 2019. 5. J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “[Audio set: An ontology and human-labeled dataset for audio events](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45857.pdf),” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780, March 2017. 6. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “[MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/pdf/1704.04861.pdf),” arXiv e-prints, p. arXiv:1704.04861, Apr 2017.