139 views
owned this note
# Total Variation Filtering #
> References:
> 1. Ivan W. Selesnick and Ilker Bayram (2010). *"Total Variation Filtering."*
> 2. [6] J.-B. Hiriart-Urruty and C. Lemar ́echal. *"Convex Analysis and Minimization Algorithms I."*
> 3. [7] R. T. Rockafellar. *"Convex analysis."*
The original model is given by
$$ \tag{1}
y = x+n,
$$
where $y\in \mathbb{R}^N$ is an observed signal, given by
* an original signal $x\in \mathbb{R}^N$, and
* a multivariate Gaussian noise $n\sim N(0^N, \Sigma)$.
With the **TV denoising** approach, it turns out to be the following minimization problem:
$$
\min_x \| y-x \|_2^2 + \lambda \sum_{i=0}^{N-1}|x_{i+1}-x_{i}|,
$$
or equivalently,
$$
\min_x \| y-x \|_2^2 + \lambda \|Dx\|_1.
$$
where
$$
D = \begin{bmatrix}
-1 & 1 & & & \\
& -1 & 1 & & \\
& & \ddots & \ddots & \\
& & & -1 & 1
\end{bmatrix}
\in \mathbb{M}(N-1, N; \mathbb{R}).
$$
Now we consider the more general form, which is given by
$$ \tag{2}
J(x) \equiv \| y-x \|_2^2 + \lambda \|Ax\|_1,
$$
where $A\in\mathbb{M}(M, N; \mathbb{R})$. We also denote that
$$ \tag{3}
J_* \equiv \min_x \| y-x \|_2^2 + \lambda \|Ax\|_1.
$$
Note that the minimization problem (2) is complicated since **the $l_1$-norm is not differentiable at 0**. Hence, an approach to minimize $J$ is to use the **dual formulation**.
## Dual formulation ##
Note that for a scalar $\alpha\in\mathbb{R}$, one have
$$
| \alpha | = \max_{|z|\leq 1} z\alpha.
$$
Likewise,
$$
\begin{aligned}
\|x\|_1 &= \sum_{i=1}^N \max_{|z_i|\leq 1} z_ix_i \\
&= \max_{\|z\|_{\infty} \leq 1} z^T x.
\end{aligned}
$$
Therefore, we may rewrite the objective function $J$ in (2) to be
$$ \tag{4}
J(x) = \| y-x \|_2^2 + \lambda \max_{|z_i|\leq 1} z^TAx,
$$
or
$$ \tag{5}
J(x) = \max_{|z_i|\leq 1} \| y-x \|_2^2 + \lambda z^TAx.
$$
Then the optimal value can be expressed as
$$
J_* = \min_x\max_{|z_i|\leq 1} \| y-x \|_2^2 + \lambda z^TAx.
$$
Define
$$ \tag{6}
F(x, z) \equiv \| y-x \|_2^2 + \lambda z^TAx,
$$
then
$$
J_* = \min_x\max_{|z_i|\leq 1} F(x, z).
$$
-------------------
### Convex analysis for $F$ ###
Explicitly to say the function $F$, it should be
$$
\begin{aligned}
F: X\times Z &\rightarrow \mathbb{R} \\
(x, z) &\mapsto \| y-x \|_2^2 + \lambda z^TAx,
\end{aligned}
$$
where $X = \mathbb{R}^N$ and $Z = [-1, 1]^M$.
We observe that
* (H1) Both $X$ and $Z$ are **nonempty closed convex sets**;
* (H2) $F$ is **a convex-concave function on $X\times Z$**:
* (H2-1) For any $z\in Z$ be given, $\varphi_z(x) \equiv F(x, z)$ is a convex function;
* (H2-2) For any $x\in X$ be given, $\psi_x(z) \equiv F(x, z)$ is a concave function;
* $F$ is **0-coercive**:
* (H3) $\exists\;z\in Z$ such that $\varphi_{z}(x)\rightarrow\infty$ as $\|x\|\rightarrow\infty$ and $x\in X$.
* (H4) $Z$ is bounded.
Note that
* the condition (H2) shows that the **optimal value of $F$ should occurred at a sattle point**, and
* conditions (H1)-(H4) ensure that **the set of sattle points on $X\times Z$ should be a nonempty convex compact set**.
> #### Theorem [6] - 4.3.1 ####
> For a function $l:X\times Y\rightarrow\mathbb{R}$ such that
> * (H1) $X\in\mathbb{R}^N$ and $Y\in\mathbb{R}^M$ are nonempty closed convex sets.
> * (H2) $l$ is convex-concave on $X\times Y$. That is,
> * (H2-1) $l(\cdot, y):X\rightarrow\mathbb{R}$ is convex $\forall y\in Y$.
> * (H2-2) $l(x, \cdot):Y\rightarrow\mathbb{R}$ is concave $\forall x\in X$.
>
> Also, the function $l$ is 0-coercive. That is
> * (H3) $X$ is bounded, or $\exists\;y\in Y$ such that $l(\cdot, y)\rightarrow\infty$ as $\|x\|\rightarrow\infty$, $x\in X$.
> * (H4) $Y$ is bounded, or $\exists\;x\in X$ such that $l(x, \cdot)\rightarrow\infty$ as $\|y\|\rightarrow\infty$, $y\in Y$.
>
> If (H1) - (H4) hold, then $l$ has a nonempty convex compact set of sattle points on $X\times Y$.
Moreover, since $F$ is a continuous finite convex-concave function on $X\times Z$ and $Z$ is bounded , one have
$$
\inf_{z\in Z}\sup_{x\in X} F(x, z) = \sup_{x\in X}\inf_{z\in Z} F(x, z).
$$
> #### Corollary [7] - 37.3.2 ####
> Let $X$ and $Y$ be nonempty closed convex sets in $\mathbb{R}^N$ and $\mathbb{R}^M$, respectively, and let $l$ be a continuous finite convex-concave function on $X\times Y$. If either $X$ or $Y$ is bounded, one have
> $$
\inf_{y\in Y}\sup_{x\in X} l(x, y) = \sup_{x\in X}\inf_{y\in Y} l(x, z).
$$
Remark that the set of sattle points is compact, so we also have
$$
\min_{z\in Z}\max_{x\in X} F(x, z) = \max_{x\in X}\min_{z\in Z} F(x, z),
$$
which means
$$
\min_{\|z\|_\infty\leq 1}\max_{x} F(x, z) = \max_{x}\min_{\|z\|_\infty\leq 1} F(x, z).
$$
-------------------
Consequently, we have
$$
\begin{aligned}
J_* &= \min_x\max_{|z_i|\leq 1} F(x, z) \\
&= \max_{|z_i|\leq 1}\min_x F(x, z) \\
&= \max_{|z_i|\leq 1}\min_x \| y-x \|_2^2 + \lambda z^TAx,
\end{aligned} \tag{7}
$$
which is the dual formulation of the TV denoising problem.
### Solving the minimization problem (7) ###
The inner minimization problem in (7) can be solved as follows:
$$
\frac{\partial}{\partial x}F(x_*, z) = -2 (y-x_*) + \lambda A^T z.
$$
As the LHS to be zero,
$$ \tag{8}
x_* = y - \frac{\lambda}{2} A^T z.
$$
Hence,
$$
\begin{aligned}
J_* &= \max_{\|z\|_\infty\leq 1} \| y-x \|_2^2 + \lambda z^TAx \\
&= \max_{\|z\|_\infty\leq 1} \| \frac{\lambda}{2} A^T z \|_2^2 + \lambda z^TA(y - \frac{\lambda}{2} A^T z) \\
&= \max_{\|z\|_\infty\leq 1} \frac{\lambda^2}{4} zA A^T z + \lambda z^TA y - \frac{\lambda^2}{2} z^TA A^T z \\
&= \max_{\|z\|_\infty\leq 1} -\frac{\lambda^2}{4} z^TA A^T z + \lambda z^TA y,
\end{aligned}
$$
or equivalently,
$$ \tag{9}
z_* = \arg\min_{\|z\|_\infty\leq 1} z^TA A^T z - \frac{4}{\lambda} z^TA y.
$$
Hence through the similar approach, we obtain the equation
$$
\begin{aligned}
0
&= \frac{\partial}{\partial z} \left( z^TA A^T z - \frac{4}{\lambda^2} z^TA y \right) \\
&=(AA^T+AA^T)z - \frac{4}{\lambda}Ay,
\end{aligned}
$$
which implies
$$
AA^Tz = \frac{2}{\lambda}Ay.
$$
But this may requies the solution to a potentially large system linear equations and furthermore does not yield a solution $z$ satisfying the constraint $\|z\|_\infty\leq 1$.
Therefore, to find $z$ solving the constrainted minimization problem (9), the **majorization-minimization method (MM Algorithm)** can be used.
### Majorization-minimization method ###
Define
$$
D(z) = z^T AA^T z - \frac{4}{\lambda}z^T Ay
$$
and setting $z^{(i)}$ as point of coincidence.
Then we can find a separable majorizer of $D(z)$ by adding the non-negative function
$$
(z - z^{(i)})^T (\alpha I - AA^T)(z - z^{(i)})
$$
to $D(z)$, where $\alpha$ is greater than or equal to the maximum eigenvalue of $AA^T$.
So, a majorizer of $D(z)$ is given by
$$
D(z) + (z - z^{(i)})^T (\alpha I - AA^T)(z - z^{(i)}),
$$
and, using the MM approach, the update equation for $z$ is given by
$$
\begin{aligned}
z^{(i+1)}
&= \arg\min_{\|z\|_\infty\leq 1} D(z) + (z - z^{(i)})^T (\alpha I - AA^T)(z - z^{(i)}) \\
&= \arg\min_{\|z\|_\infty\leq 1} \alpha z^T z - 2\left( A\left(\frac{2}{\lambda}y - A^T z^{(i)}\right) + \alpha z^{(i)} \right)^T z + K \\
&= \arg\min_{\|z\|_\infty\leq 1} z^T z - 2\left(\frac{1}{\alpha} A\left(\frac{2}{\lambda}y - A^T z^{(i)}\right) + z^{(i)} \right)^T z \\
&= \arg\min_{\|z\|_\infty\leq 1} z^T z - {2b^{(i)}}^T z,
\end{aligned} \tag{10-13}
$$
where $K$ is the remainder term which is independent of $z$, and
$$
b^{(i)} = \alpha z^{(i)} + \frac{1}{\alpha} A\left(\frac{2}{\lambda}y - A^T z^{(i)}\right).
$$
Consider first the scalar case:
$$ \tag{14}
\displaystyle\arg\min_{\|z\|_\infty\leq 1} z^2 - 2bz
$$
The minimum of $z^2 - 2bz$ is at $z=b$. If $|b|\leq 1$, then the solution is $z=b$; if $|b| > 1$, then the solution is $z = \text{sgn}(b)$. Define the clipping function from $\mathbb{R}^2$ to $\mathbb{R}$:
$$
\text{clip}(b, T) \equiv
\left\{
\begin{aligned}
& b & \text{if }|b|\leq T, \\
& T\cdot\text{sgn}(b) & \text{if } |b| > T.
\end{aligned}
\right. \tag{15}
$$
Then we can write the solution of (14) to be $z=\text{clip} (b, 1)$.
Note that the vector case (13) is separable. Hence an update equation for $z$ is given by
$$
z^{(i+1)} =
\begin{bmatrix}
\text{clip}(b^{(i)}_1, 1) \\
\text{clip}(b^{(i)}_2, 1) \\
\vdots \\
\text{clip}(b^{(i)}_M, 1)
\end{bmatrix}
\equiv \text{clip}(b^{(i)}, 1).
$$
That is,
$$ \tag{16}
z^{(i+1)} = \text{clip}\left(\alpha z^{(i)} + \frac{1}{\alpha} A\left(\frac{2}{\lambda}y - A^T z^{(i)}\right), 1 \right),
$$
where $i$ is the iteration index. Once $z^{(i)}$ has converged to one's satisfaction, the denoised signal $x$ is given by (8).
Since the optimization problem is convex, the iteration will converge from any initialization. We may choose, say $z^{(0)}=0$.
We call this the **iterative clipping algorithm**.
## Iterative clipping algorithm ##
This algorithm can be written as
$$
\begin{aligned}
x^{(i+1)} &= y - \frac{\lambda}{2}A^T z^{(i)}, \\
z^{(i+1)} &= \text{clip}\left(z^{(i)} + \frac{2}{\alpha\lambda} Ax^{(i+1)}, 1 \right).
\end{aligned} \tag{17,18}
$$
Scaling $z$ by $\lambda/2$, we have the following equivalent form:
$$
\begin{aligned}
x^{(i+1)} &= y - A^T z^{(i)}, \\
z^{(i+1)} &= \text{clip}\left( z^{(i)} + \frac{1}{\alpha} Ax^{(i+1)}, \frac{\lambda}{2} \right).
\end{aligned} \tag{19,20}
$$
### Implementations ###
#### MATLAB ####
```matlab
function [x,J] = denoiseTV(y,lambda,Nit)
% [x,J] = denoiseTV(y,lambda,a,Nit)
% Total variation filtering (denoising) using
% iterative clipping algorithm.
% INPUT
% y - noisy signal (row vector)
% lambda - regularization parameter
% Nit - number of iterations
% OUTPUT
% x - result of denoising
% J - objective function
J = zeros(1,Nit); % objective function
N = length(y);
z = zeros(1,N-1); % initialize z
alpha = 4;
T = lambda/2;
for k = 1:Nit
x = y - [-z(1) -diff(z) z(end)]; % y - D’ z
J(k) = sum(abs(x-y).^2) + lambda * sum(abs(diff(x)));
z = z + 1/alpha * diff(x); % z + 1/alpha D z
z = max(min(z,T),-T); % clip(z,T)
end
```
#### Python 3 ####
```python
import numpy as np
def denoiseTV(y, lambda_, Nit):
"""1D TV denoising using the iterative clippling algorithm
Args:
y : noisy signal (row vector)
lambda_ : regularization parameter
Nit : number of iterations
Returns:
x : result of denoising
J : objective function
"""
J = np.zeros(Nit, dtype=np.float)
z = np.zeros(len(y)-1, dtype=np.float)
alpha = 4
T = lambda_/2
for k in range(Nit):
dz = np.append(np.append(np.array([-z[0]], dtype=np.float), -np.diff(z)), z[-1])
x = y - dz
J[k] = np.sum(np.power(np.abs(x-y), 2)) + lambda_ * np.sum(np.abs(np.diff(x)))
z = z + 1/alpha * np.diff(x)
z = np.maximum(np.minimum(z, T), -T)
return x, J
```

