Total Variation Filtering

# Total Variation Filtering # > References: > 1. Ivan W. Selesnick and Ilker Bayram (2010). *"Total Variation Filtering."* > 2. [6] J.-B. Hiriart-Urruty and C. Lemar ́echal. *"Convex Analysis and Minimization Algorithms I."* > 3. [7] R. T. Rockafellar. *"Convex analysis."* The original model is given by $$ \tag{1} y = x+n, $$ where $y\in \mathbb{R}^N$ is an observed signal, given by * an original signal $x\in \mathbb{R}^N$, and * a multivariate Gaussian noise $n\sim N(0^N, \Sigma)$. With the **TV denoising** approach, it turns out to be the following minimization problem: $$ \min_x \| y-x \|_2^2 + \lambda \sum_{i=0}^{N-1}|x_{i+1}-x_{i}|, $$ or equivalently, $$ \min_x \| y-x \|_2^2 + \lambda \|Dx\|_1. $$ where $$ D = \begin{bmatrix} -1 & 1 & & & \\ & -1 & 1 & & \\ & & \ddots & \ddots & \\ & & & -1 & 1 \end{bmatrix} \in \mathbb{M}(N-1, N; \mathbb{R}). $$ Now we consider the more general form, which is given by $$ \tag{2} J(x) \equiv \| y-x \|_2^2 + \lambda \|Ax\|_1, $$ where $A\in\mathbb{M}(M, N; \mathbb{R})$. We also denote that $$ \tag{3} J_* \equiv \min_x \| y-x \|_2^2 + \lambda \|Ax\|_1. $$ Note that the minimization problem (2) is complicated since **the $l_1$-norm is not differentiable at 0**. Hence, an approach to minimize $J$ is to use the **dual formulation**. ## Dual formulation ## Note that for a scalar $\alpha\in\mathbb{R}$, one have $$ | \alpha | = \max_{|z|\leq 1} z\alpha. $$ Likewise, $$ \begin{aligned} \|x\|_1 &= \sum_{i=1}^N \max_{|z_i|\leq 1} z_ix_i \\ &= \max_{\|z\|_{\infty} \leq 1} z^T x. \end{aligned} $$ Therefore, we may rewrite the objective function $J$ in (2) to be $$ \tag{4} J(x) = \| y-x \|_2^2 + \lambda \max_{|z_i|\leq 1} z^TAx, $$ or $$ \tag{5} J(x) = \max_{|z_i|\leq 1} \| y-x \|_2^2 + \lambda z^TAx. $$ Then the optimal value can be expressed as $$ J_* = \min_x\max_{|z_i|\leq 1} \| y-x \|_2^2 + \lambda z^TAx. $$ Define $$ \tag{6} F(x, z) \equiv \| y-x \|_2^2 + \lambda z^TAx, $$ then $$ J_* = \min_x\max_{|z_i|\leq 1} F(x, z). $$ ------------------- ### Convex analysis for $F$ ### Explicitly to say the function $F$, it should be $$ \begin{aligned} F: X\times Z &\rightarrow \mathbb{R} \\ (x, z) &\mapsto \| y-x \|_2^2 + \lambda z^TAx, \end{aligned} $$ where $X = \mathbb{R}^N$ and $Z = [-1, 1]^M$. We observe that * (H1) Both $X$ and $Z$ are **nonempty closed convex sets**; * (H2) $F$ is **a convex-concave function on $X\times Z$**: * (H2-1) For any $z\in Z$ be given, $\varphi_z(x) \equiv F(x, z)$ is a convex function; * (H2-2) For any $x\in X$ be given, $\psi_x(z) \equiv F(x, z)$ is a concave function; * $F$ is **0-coercive**: * (H3) $\exists\;z\in Z$ such that $\varphi_{z}(x)\rightarrow\infty$ as $\|x\|\rightarrow\infty$ and $x\in X$. * (H4) $Z$ is bounded. Note that * the condition (H2) shows that the **optimal value of $F$ should occurred at a sattle point**, and * conditions (H1)-(H4) ensure that **the set of sattle points on $X\times Z$ should be a nonempty convex compact set**. > #### Theorem [6] - 4.3.1 #### > For a function $l:X\times Y\rightarrow\mathbb{R}$ such that > * (H1) $X\in\mathbb{R}^N$ and $Y\in\mathbb{R}^M$ are nonempty closed convex sets. > * (H2) $l$ is convex-concave on $X\times Y$. That is, > * (H2-1) $l(\cdot, y):X\rightarrow\mathbb{R}$ is convex $\forall y\in Y$. > * (H2-2) $l(x, \cdot):Y\rightarrow\mathbb{R}$ is concave $\forall x\in X$. > > Also, the function $l$ is 0-coercive. That is > * (H3) $X$ is bounded, or $\exists\;y\in Y$ such that $l(\cdot, y)\rightarrow\infty$ as $\|x\|\rightarrow\infty$, $x\in X$. > * (H4) $Y$ is bounded, or $\exists\;x\in X$ such that $l(x, \cdot)\rightarrow\infty$ as $\|y\|\rightarrow\infty$, $y\in Y$. > > If (H1) - (H4) hold, then $l$ has a nonempty convex compact set of sattle points on $X\times Y$. Moreover, since $F$ is a continuous finite convex-concave function on $X\times Z$ and $Z$ is bounded , one have $$ \inf_{z\in Z}\sup_{x\in X} F(x, z) = \sup_{x\in X}\inf_{z\in Z} F(x, z). $$ > #### Corollary [7] - 37.3.2 #### > Let $X$ and $Y$ be nonempty closed convex sets in $\mathbb{R}^N$ and $\mathbb{R}^M$, respectively, and let $l$ be a continuous finite convex-concave function on $X\times Y$. If either $X$ or $Y$ is bounded, one have > $$ \inf_{y\in Y}\sup_{x\in X} l(x, y) = \sup_{x\in X}\inf_{y\in Y} l(x, z). $$ Remark that the set of sattle points is compact, so we also have $$ \min_{z\in Z}\max_{x\in X} F(x, z) = \max_{x\in X}\min_{z\in Z} F(x, z), $$ which means $$ \min_{\|z\|_\infty\leq 1}\max_{x} F(x, z) = \max_{x}\min_{\|z\|_\infty\leq 1} F(x, z). $$ ------------------- Consequently, we have $$ \begin{aligned} J_* &= \min_x\max_{|z_i|\leq 1} F(x, z) \\ &= \max_{|z_i|\leq 1}\min_x F(x, z) \\ &= \max_{|z_i|\leq 1}\min_x \| y-x \|_2^2 + \lambda z^TAx, \end{aligned} \tag{7} $$ which is the dual formulation of the TV denoising problem. ### Solving the minimization problem (7) ### The inner minimization problem in (7) can be solved as follows: $$ \frac{\partial}{\partial x}F(x_*, z) = -2 (y-x_*) + \lambda A^T z. $$ As the LHS to be zero, $$ \tag{8} x_* = y - \frac{\lambda}{2} A^T z. $$ Hence, $$ \begin{aligned} J_* &= \max_{\|z\|_\infty\leq 1} \| y-x \|_2^2 + \lambda z^TAx \\ &= \max_{\|z\|_\infty\leq 1} \| \frac{\lambda}{2} A^T z \|_2^2 + \lambda z^TA(y - \frac{\lambda}{2} A^T z) \\ &= \max_{\|z\|_\infty\leq 1} \frac{\lambda^2}{4} zA A^T z + \lambda z^TA y - \frac{\lambda^2}{2} z^TA A^T z \\ &= \max_{\|z\|_\infty\leq 1} -\frac{\lambda^2}{4} z^TA A^T z + \lambda z^TA y, \end{aligned} $$ or equivalently, $$ \tag{9} z_* = \arg\min_{\|z\|_\infty\leq 1} z^TA A^T z - \frac{4}{\lambda} z^TA y. $$ Hence through the similar approach, we obtain the equation $$ \begin{aligned} 0 &= \frac{\partial}{\partial z} \left( z^TA A^T z - \frac{4}{\lambda^2} z^TA y \right) \\ &=(AA^T+AA^T)z - \frac{4}{\lambda}Ay, \end{aligned} $$ which implies $$ AA^Tz = \frac{2}{\lambda}Ay. $$ But this may requies the solution to a potentially large system linear equations and furthermore does not yield a solution $z$ satisfying the constraint $\|z\|_\infty\leq 1$. Therefore, to find $z$ solving the constrainted minimization problem (9), the **majorization-minimization method (MM Algorithm)** can be used. ### Majorization-minimization method ### Define $$ D(z) = z^T AA^T z - \frac{4}{\lambda}z^T Ay $$ and setting $z^{(i)}$ as point of coincidence. Then we can find a separable majorizer of $D(z)$ by adding the non-negative function $$ (z - z^{(i)})^T (\alpha I - AA^T)(z - z^{(i)}) $$ to $D(z)$, where $\alpha$ is greater than or equal to the maximum eigenvalue of $AA^T$. So, a majorizer of $D(z)$ is given by $$ D(z) + (z - z^{(i)})^T (\alpha I - AA^T)(z - z^{(i)}), $$ and, using the MM approach, the update equation for $z$ is given by $$ \begin{aligned} z^{(i+1)} &= \arg\min_{\|z\|_\infty\leq 1} D(z) + (z - z^{(i)})^T (\alpha I - AA^T)(z - z^{(i)}) \\ &= \arg\min_{\|z\|_\infty\leq 1} \alpha z^T z - 2\left( A\left(\frac{2}{\lambda}y - A^T z^{(i)}\right) + \alpha z^{(i)} \right)^T z + K \\ &= \arg\min_{\|z\|_\infty\leq 1} z^T z - 2\left(\frac{1}{\alpha} A\left(\frac{2}{\lambda}y - A^T z^{(i)}\right) + z^{(i)} \right)^T z \\ &= \arg\min_{\|z\|_\infty\leq 1} z^T z - {2b^{(i)}}^T z, \end{aligned} \tag{10-13} $$ where $K$ is the remainder term which is independent of $z$, and $$ b^{(i)} = \alpha z^{(i)} + \frac{1}{\alpha} A\left(\frac{2}{\lambda}y - A^T z^{(i)}\right). $$ Consider first the scalar case: $$ \tag{14} \displaystyle\arg\min_{\|z\|_\infty\leq 1} z^2 - 2bz $$ The minimum of $z^2 - 2bz$ is at $z=b$. If $|b|\leq 1$, then the solution is $z=b$; if $|b| > 1$, then the solution is $z = \text{sgn}(b)$. Define the clipping function from $\mathbb{R}^2$ to $\mathbb{R}$: $$ \text{clip}(b, T) \equiv \left\{ \begin{aligned} & b & \text{if }|b|\leq T, \\ & T\cdot\text{sgn}(b) & \text{if } |b| > T. \end{aligned} \right. \tag{15} $$ Then we can write the solution of (14) to be $z=\text{clip} (b, 1)$. Note that the vector case (13) is separable. Hence an update equation for $z$ is given by $$ z^{(i+1)} = \begin{bmatrix} \text{clip}(b^{(i)}_1, 1) \\ \text{clip}(b^{(i)}_2, 1) \\ \vdots \\ \text{clip}(b^{(i)}_M, 1) \end{bmatrix} \equiv \text{clip}(b^{(i)}, 1). $$ That is, $$ \tag{16} z^{(i+1)} = \text{clip}\left(\alpha z^{(i)} + \frac{1}{\alpha} A\left(\frac{2}{\lambda}y - A^T z^{(i)}\right), 1 \right), $$ where $i$ is the iteration index. Once $z^{(i)}$ has converged to one's satisfaction, the denoised signal $x$ is given by (8). Since the optimization problem is convex, the iteration will converge from any initialization. We may choose, say $z^{(0)}=0$. We call this the **iterative clipping algorithm**. ## Iterative clipping algorithm ## This algorithm can be written as $$ \begin{aligned} x^{(i+1)} &= y - \frac{\lambda}{2}A^T z^{(i)}, \\ z^{(i+1)} &= \text{clip}\left(z^{(i)} + \frac{2}{\alpha\lambda} Ax^{(i+1)}, 1 \right). \end{aligned} \tag{17,18} $$ Scaling $z$ by $\lambda/2$, we have the following equivalent form: $$ \begin{aligned} x^{(i+1)} &= y - A^T z^{(i)}, \\ z^{(i+1)} &= \text{clip}\left( z^{(i)} + \frac{1}{\alpha} Ax^{(i+1)}, \frac{\lambda}{2} \right). \end{aligned} \tag{19,20} $$ ### Implementations ### #### MATLAB #### ```matlab function [x,J] = denoiseTV(y,lambda,Nit) % [x,J] = denoiseTV(y,lambda,a,Nit) % Total variation filtering (denoising) using % iterative clipping algorithm. % INPUT % y - noisy signal (row vector) % lambda - regularization parameter % Nit - number of iterations % OUTPUT % x - result of denoising % J - objective function J = zeros(1,Nit); % objective function N = length(y); z = zeros(1,N-1); % initialize z alpha = 4; T = lambda/2; for k = 1:Nit x = y - [-z(1) -diff(z) z(end)]; % y - D’ z J(k) = sum(abs(x-y).^2) + lambda * sum(abs(diff(x))); z = z + 1/alpha * diff(x); % z + 1/alpha D z z = max(min(z,T),-T); % clip(z,T) end ``` #### Python 3 #### ```python import numpy as np def denoiseTV(y, lambda_, Nit): """1D TV denoising using the iterative clippling algorithm Args: y : noisy signal (row vector) lambda_ : regularization parameter Nit : number of iterations Returns: x : result of denoising J : objective function """ J = np.zeros(Nit, dtype=np.float) z = np.zeros(len(y)-1, dtype=np.float) alpha = 4 T = lambda_/2 for k in range(Nit): dz = np.append(np.append(np.array([-z[0]], dtype=np.float), -np.diff(z)), z[-1]) x = y - dz J[k] = np.sum(np.power(np.abs(x-y), 2)) + lambda_ * np.sum(np.abs(np.diff(x))) z = z + 1/alpha * np.diff(x) z = np.maximum(np.minimum(z, T), -T) return x, J ``` ![](https://minio.mcl.math.ncu.edu.tw:443/hackmd/uploads/upload_ecb6062c78ab597f8be3bee36c601be0.png) ![](https://minio.mcl.math.ncu.edu.tw:443/hackmd/uploads/upload_d89a96221f2d57f809daeeca36bb9848.png)