This notebook describes an implementation of the STFT. Specifically, I consider how the forward (analysis) and the inverse (synthesis) operations are related by a transpose operation, and use this to establish perfect reconstruction (PR) conditions. The PR conditions depend only on the window function and the hop size. Using this, I show how to find windows that make the STFT a self-inverting transform, or a tight frame.
# some imports
using PyPlot
using FFTW
For demonstration purposes, let us choose a window, and a hop-size. We specifically use a Hann window in this example. Note that the hop-size is not really a special value for this window function.
win_len = 50
win = sin.(Ï€ * (1:win_len)/(win_len+1)).^2;
hop = 15
fig, ax = subplots(1,1,figsize = (8,3))
ax.plot(1:win_len, win)
ax.set_title("window function")
We now produce a random signal to work with. In order to keep the code tidy, I choose the signal length such that it allows $K$ hops of the window, without any left-over samples.
K = 30
signal_len = win_len + (K-1) * hop
x = randn(signal_len);
Given this signal, window, and hop-size, the STFT is computed by as follows. Starting the initial point of the window with that of the signal, we
This operation is demonstrated in the code snippet below. There's one catch -- I do a little magnitude adjustment by dividing the fft by the sqrt of the length of the fft. This makes the fft an orthogonal operation, which I will need for the demonstration in the sequel. Normally, we don't bother with this normalization in an actual implementation.
# declare a complex array to hold the STFT of x
X = Array{Complex,2}(undef, (win_len, K))
# compute windowed fft's
for (i,j) = zip( 1 : hop : (signal_len - win_len + 1), 1 : K )
X[:,j] = fft(x[i:i + win_len - 1] .* win) / √(win_len)
end
Finally, I note that the operation that takes $x$ to $X$ is a linear operation. Let me denote this operation as $U$. Note that $U$ takes the signal (the 1D sequence) $x$ of length $S$, and maps it onto a matrix of size $W\times K$, where $W$ is the length of the window and $K$ is the number of hops (hop count). Therefore, $U : \mathbb{C}^S \to \mathbb{C}^{W\times K}$. In the sequel, we will be interested in the transpose of $U$.
In order to find $U^T$, I interpret $U$ as a cascade of three linear mappings, namely $M_1$, $M_2$, $M_3$.
# definition of Map1, M1
function Map1(x, signal_length, window_length, hop_size, hop_count)
X = Array{Complex, 2}(undef, (window_length,K))
for (i,j) = zip( 1 : hop_size : (signal_length - window_length + 1), 1 : hop_count )
X[:,j] = x[i:i + window_length - 1]
end
return X
end
M1 = z -> Map1(z, signal_len, win_len, hop, K)
# definition of Map2, M2
function Map2(X, window)
return X .* window
end
M2 = Z -> Map2(Z, win)
# definition of M3, as an orthogonal operation
M3 = Z -> fft(Z,[1])/√(win_len)
# we declare a new map by cascading these operations.
M = z -> M3( M2( M1(z) ) )
I now verify that the operator $M = M_3\,M_2\,M_1$ in fact reproduces the STFT we computed earlier. The error below should be numerically zero.
Y = M(x);
error = sum(abs.(Y[:] - X[:]).^2)
println("Error : ", error)
Let us now consider the transpose operations. For this, our guide will be the definition of transpose. Recall that if $A$ and $B$ are two vector spaces with inner product, and $Q$ is a linear mapping from $A$ to $B$, then $Q^T$ is defined as that operation for which $\langle y, Q(x)\rangle = \langle Q^T(y), x \rangle$ for any $x\in A$, and $y \in B$.
Given this definition, I first note that, since $M_3$ simply does in-place (normalized) fft's, its transpose is its inverse, which is (normalized) in-place ifft's. Let's declare this as a new mapping $M_3^T$.
M3T = Z -> ifft(Z,[1]) * √(win_len)
The tranpose of $M_2$ can also be obtained by considering its action. $M_2$ simply point-multiplies its input matrix with a real-valued matrix of the same size. Abusing notation, we have $M_2(x)_n = x_n\,w_n$. Therefore, we can verify that $\langle x, M_2(y)\rangle = \langle M_2(x), y\rangle$. In words, $M_2$ is its own transpose (it's Hermitian).
M2T = M2
What remains is $M_1$. The transpose of $M_1$ amounts to an 'overlap-add' operation. This can be verified algebraically, using the definition of transpose. I will skip the algebraic derivation here (for now...), but I will provide a numerical verification below. Once again, we define $M_1^T$ with the aid of a generic function (Map1Xpose).
function Map1Xpose(X, signal_len, win_len, hop, K)
x = zeros(signal_len)im
for (i,j) = zip( 1 : hop : (signal_len - win_len + 1), 1 : K )
x[i:i + win_len - 1] += X[:,j]
end
return x
end
M1T = Z -> Map1Xpose(Z, signal_len, win_len, hop, K)
Let us now numerically verify that our transposes are correctly identified and defined. For this, we define an inner product function.
inner_product = (x,y) -> sum( (x .* conj(y))[:] )
We start by checking the validity of $M_1^T$. For this, we randomly choose $z \in \mathbb{C}^S$, $Z \in \mathbb{C}^{W\times K}$ and we compute the difference $\langle M_1(z), Z \rangle - \langle z, M_1^T(Z) \rangle$. This value should be numerically zero.
z = randn(signal_len)+ randn(signal_len)im
Z = randn(win_len,K) + randn(win_len,K)im
in1 = inner_product( M1(z), Z )
in2 = inner_product( z, M1T(Z) )
println("*** Testing transpose for M1 ***")
println("Difference between the two inner products: ", in1 - in2)
Let us perform a similar verification for $M_2$. Note now that we need to use two matrices.
Z2 = randn(win_len,K) + randn(win_len,K)im
in1 = inner_product( M2(Z), Z2 )
in2 = inner_product( Z, M2T(Z2) )
println("*** Testing transpose for M2 ***")
println("Difference between the two inner products: ", in1 - in2)
Finally, below I verify the validity of $M_3^T$.
in1 = inner_product( M3(Z), Z2 )
in2 = inner_product( Z, M3T(Z2) )
println("*** Testing transpose for M3 ***")
println("Difference between the two inner products: ", in1 - in2)
Using the operators $M_1^T$, $M_2^T$, $M_3^T$, we define the cascade operator $M = M_1^T\,M_2^T\,M_3^T$, and also define the 'frame' operator $M^T\,M$.
MT = Z -> M1T( M2T( M3T(Z) ) )
MTM = z -> MT( M(z) )
Note that the range space of $M^T\,M$ is the signal space. If it were that $M^T\,M = I$, then the analysis and synthesis operations would actually correspond to a perfect reconstruction filter bank. Let's check if this is the case.
x2 = MTM(x)
fig, ax = subplots(1,1, figsize = (10,3))
ax.plot(x, label = "input")
ax.plot(x2, label = "reconstruction")
ax.legend()
Save for the edges, we almost have perfect reconstruction, but not exactly. Let us now derive what adjustment we need to do, to achieve PR.
For that, consider the mapping $M^T\,M$. Since $M_3^T\,M_3 = I$, (transpose of fft is its inverse), we find that, $$ M^T\,M = M_1^T\,M_2^T\,M_3^T\,M_3\,M_2\,M_1 = M_1^T\,(M_2^T\,M_2)\,M_1. $$
Recall now that $M_2^T = M_2$. Since the action of $M_2$ is to point-multiply each column of a matrix with a 1D window function, it follows that $M_2^T\,M_2 = M_2^2$ simply point-multiplies each column with the square of the window function.
Given this, and the fact that $M_1^T$ amounts to overlap-adding, we find (after some thinking, or algebra) that the action of $M^T\,M$ is to point multiply the input with a signal defined as $$ P(n) = \sum_{k} w^2(n - k\,H), \text{ for } 1 \leq n \leq S $$ where $H$ is the hop-size, and $S$ is the signal length (Julia indices start from 1).
We can obtain $P$ with the aid of the mappings we defined so far, by using a constant signal as the input.
P = MTM(ones(signal_len))
fig, ax = subplots(1,1,figsize= (8,3))
ax.plot(1:signal_len, P)
Let us now verify the claim that $M^T\,M(x)$ is equal to $x$ point-multiplied by the window $P$ above, by plotting the difference between the two signals against the point-product $P\circ x$.
fig, ax = subplots(1,1, figsize = (10,3) )
ax.plot(P.*x, label = "P[n] x[n]")
ax.plot(P.*x - MTM(x), label = "P[n] x[n] - MTM(x)[n]")
ax.legend()
This observation immediately suggests an inverse operation : apply $M^T\,M$ and correct by dividing by $P$.
However, we can do better. By 'normalizing' (made precise below) the window, we can make $P$ equal unity in the middle portion. This in turn means that if the signal were infinite length, the analysis and synthesis operations would be PR. This is preferrable from a numerical point of view because it implies that the singular values of $M$ and $M^T$ are all unity.
Suppose now that in fact we were operating on infinite length sequences. Then, the function $P$ would be periodic by the hop size, $H$. Let us now define a 'normalized' window $\hat{w}(n)$ as $$ \hat{w}(n) = \frac{w(n)}{\sqrt{P(n)}}. $$ This normalized window function has its own periodic $\hat{P}$, which evaluates to \begin{align} \hat{P}(n) &= \sum_k\,\hat{w}^2(n - k\,H) \\ &= \sum_k \frac{w^2(n-kH)}{P(n - kH)} \\ &= \sum_k \frac{w^2(n-kH)}{P(n)} \quad \quad \text{(because $P$ is periodic with $H$)}\\ &= \frac{1}{{P(n)}}\sum_k w^2(n-kH)\\ &= \frac{1}{{P(n)}}P(n)\\ &= 1. \end{align} In words, the new window achieves perfect reconstruction without any further correction steps.
Let us now modify our window and define modified mappings. We only need to compute the portion of $P$ that is on the support of the window function.
# initialize the normalization window
norm = win.^2
for i = 1+hop:hop:win_len
norm[1:end-i+1] += win[i:end].^2
norm[i:end] += win[1:end-i+1].^2
end
win_hat = win ./ sqrt.(real(norm))
fig, ax = subplots(1,1,figsize = (10,3))
ax.plot(win, label = "original window function")
ax.plot(win_hat, label = "modified window function")
ax.legend()
ax.set_title("window functions")
We now modify the mappings. Notice that we only need to modify $M_2$, $M_2^T$, since $M_1$, $M_3$ do not use the window function.
# modify M2
MÌ‚2 = Z -> Map2(Z, win_hat)
MÌ‚2T = MÌ‚2
# modify M
MÌ‚ = z -> M3(MÌ‚2(M1(z)))
MÌ‚T = Z -> M1T(MÌ‚2T(M3T(Z)))
Let us now check whether we get PR with the modified window. Note that we expect PR only in the middle portion, between the first and last $N \leq (W-H)$ samples.
N = win_len - hop
PÌ‚ = MÌ‚T(MÌ‚(ones(signal_len)))
dif = MÌ‚T(MÌ‚(x)) - x
fig, ax = subplots(1,1,figsize = (10,4))
ax.plot(x, label = "input signal")
ax.plot(PÌ‚, label = "PÌ‚")
ax.plot(real(dif), label = "real part of the reconstruction error")
ax.plot(imag(dif), label = "imaginary part of the reconstruction error")
ax.set_xticks([(N), signal_len - (N)], minor = true)
ax.xaxis.grid(true, linestyle = "dashed", which = "minor")
ax.legend()
Ilker Bayram, ibayram@ieee.org, 2020