Discrete Wavelet Transform

The wavelet transform was borne out of a need for further developments from Fourier transforms. Wavelets transform signals in the time domain (rather, assumed to be in the time domain) to a joint time-frequency domain. The main weakness that was found in Fourier transforms was their lack of localized support, which made them susceptible to Heisenberg's Uncertainty principle. In short, this means that we could get information about the frequencies present in a signal, but not where and when the frequencies occurred. Wavelets, on the other hand, are not anywhere as subject to it.

A wavelet is, as the name might suggest, a little piece of a wave. Where a sinusoidal wave as is used by Fourier transforms carries on repeating itself for infinity, a wavelet exists only within a finite domain, and is zero-valued elsewhere. A wavelet transform involves convolving the signal against particular instances of the wavelet at various time scales and positions. Since we can model changes in frequency by changing the time scale, and model time changes by shifting the position of the wavelet, we can model both frequency and location of frequency. Hence, the dubbing of the WT's resulting domain as a joint time-frequency domain.

To perform these convolutions at every position and every characteristic scale is called the continuous wavelet transform. As you can probably guess, this is a terribly costly process. Fortunately, we don't actually need to go that far. Most signal data we'd like to process on a computer tends to be stored discretely. Moreover, Nyquist's theorem tells us that the highest frequency we can model with discrete signal data is half that of the sampling frequency. So that means we, at worst, have to perform the transform at every other point.

The continuous wavelet transform is generally expressed with the following integral.

When we actually perform the discrete wavelet transform, the common approach involves using the dyadic scheme. This is where we increase the scale and step spacing between wavelets by a factor of two at each step. e.g., in the first pass, we would use wavelets of scale 1 and space them 1 sample apart. Then in the second pass, we'd scale them to two samples, and space them apart by two samplepoints.

For a visual representation --

Everywhere you see an x is a point where the convolution is performed at that scale. The same number of calculations is needed for each convolution, but fewer actually need to be done in each pass. Fortunately, we cannot forget Nyquist's theorem, so the first pass, where the wavelets are at a scale to take up a single samplepoint is unnecessary. The smallest scale we need to consider is where the wavelet takes up two samplepoints. This also means that the transform takes up the same amount of space as the original data.

What we do is take a lowpass(scaling function) and a highpass (wavelet function) version of the curve and separate the highpass and lowpass information. In short, we treat the wavelet transform as if it is a filter bank. And we iterate downward along the lowpass subband. The lowpass data is treated as a signal in its own right and is subdivided into its own low and high subbands.

Ultimately, we can go all the way down until the lowpass and highpass bands are a single value, but whether or not we do that depends on the application. On images, for instance, doing so may prove overkill.

I could go on for ages about actually choosing the filter banks for your wavelet and scaling functions, but that would go a little beyond the scope of this tutorial. Moreover, I don't claim to be enough of an expert to have every single secret in my back pocket. There's plenty more relevant information out there to grab than anyone on Earth could possibly keep track of. Let's just say, at the very least, that the wavelet and scaling filters must be orthogonal so that the highpass and lowpass information are distinct and mutually exclusive. That is, there is no overlap between the data representations in frequency space. Also, that since you're progressively dividing into two parts, the integral of the scaling filter should be sqrt(2) to keep the energy of the wavelet constant for every depth.

There are plenty of resources to find dozens upon dozens of wavelet and corresponding scaling functions. If you've seen anything on wavelets before, the two filters you'll probably instantly recognize are the Haar and Daubechies-4 banks.

Haar Daubechies-4 (D4)
H = {1 / sqrt(2), 1 / sqrt(2) }
G = {1 / sqrt(2), -1 / sqrt(2) }
H = { (1 + sqrt(3)) / (4 * sqrt(2)),
          (3 + sqrt(3)) / (4 * sqrt(2)),
          (3 - sqrt(3)) / (4 * sqrt(2)),
          (1 - sqrt(3)) / (4 * sqrt(2)) }
G[0] = H[3], G[1] = -H[2], G[2] = H[1], G[3] = -H[0]

The Haar wavelet is probably the most basic and is equivalent to an old-fashioned "sum & difference" transformation. The Daubechies-4 (or more accurately, the second-order member of the Daubechies family of wavelet) is probably the one shape most associated with the word wavelet. And while I'm on the subject of names, I should note that there can be some nomenclature disagreement. Some people will use the name Daubechies-4 (since there are 4 elements in the filter bank) while others will say Daubechies-2 (since it's a second-order filter). Similarly, you'll see biorthogonal wavelets like CDF(2,2)* also go by the name CDF(5,3). Fortunately, people are usually consistent with one naming convention, so it will be certain after seeing the actual filter bank data what someone means.

Let's assume for the sake of examples that we're going to use the Daubechies-4, which means our filter banks will contain 4 elements.

In this case, our code(or pseudocode) to perform one step of the DWT looks something like this --

for(i = 0; i < (n/2); ++i)
        yh[i] = x[2i]*g[3] + x[2i+1]*g[2] + x[2i+2]*g[1] + x[2i+3]*g[0];
        yl[i] = x[2i]*h[3] + x[2i+1]*h[2] + x[2i+2]*h[1] + x[2i+3]*h[0];

'x' is the input signal, and 'n' is the length of that signal. 'g' is the wavelet filter and 'h' is the scaling filter. 'yh' is the resulting highpass data and 'yl' is the resulting lowpass data, each of these is half the size of our original data (i.e. n/2). In practice, we'd also do a few extra tasks, such as copying the highpass and lowpass data back into a destination buffer (same size as the source buffer). In addition, since this is one step, we can take the lowpass half of the data and repeat the process with it a number of times(and n will be half its value from the previous step).

One thing worth noting is that there should actually be a wraparound through the buffer, so even though in the pseudocode, there are indices like [2i+3], it should actually be [(2i+3) % n]. To be more accurate, though, the fact is that we're treating the buffer as a circular list. So, for instance, say you have a buffer of size n=256. If we index buf[256], it should actually wrap around to buf[0] -- essentially, (index % n). Similarly, if we try to index buf[-1], it should wrap to the end and read from buf[255] -- essentially, (n + index).

Anyway, that's the idea for the forward transform, and here's the general idea for the inverse transform.

for(i = 0; i < (n/2); ++i)
        x[2i  ] = yl[i-1]*h[1] + yl[i]*h[3] + yh[n-1]*g[1] + yh[n]*g[3];
        x[2i+1] = yl[n-1]*h[0] + yl[n]*h[2] + yh[n-1]*g[0] + yh[n]*g[2];

Of course, the examples shown are for a 1-dimensional transform, so as it is, it is only suitable for 1-dimensional data such as audio. We can make this into a separable 2-d transform (e.g. for images) rather simply by performing each step on each dimension individually. That is, we perform a 1-d transform on all the horizontal lines in the image, and then perform a 1-d transform on all the vertical lines.

To help visualize what's going on during these transforms, we can look at it in the form of a matrix multiplication. For the sake of space, I'll assume again, the D4 filter (4 elements), and a signal that's only 8 samples in length. We'll treat those 8 samples as a 8-element column vector, and the lowpass and highpass sections as two 4-vectors combined to make one 8-vector.

[yl(0)] = [ h3 h2 h1 h0 0  0  0  0  ][x(0)]
[yl(1)] = [ 0  0  h3 h2 h1 h0 0  0  ][x(1)]
[yl(2)] = [ 0  0  0  0  h3 h2 h1 h0 ][x(2)]
[yl(3)] = [ h1 h0 0  0  0  0  h3 h2 ][x(3)]
[yh(0)] = [ g3 g2 g1 g0 0  0  0  0  ][x(4)]
[yh(1)] = [ 0  0  g3 g2 g1 g0 0  0  ][x(5)]
[yh(2)] = [ 0  0  0  0  g3 g2 g1 g0 ][x(6)]
[yh(3)] = [ g1 g0 0  0  0  0  g3 g2 ][x(7)]

And for the inverse transform, we have something like this --

[x(0)] = [ h3 0  0  h1 g3 0  0  g1 ][yl(0)]
[x(1)] = [ h2 0  0  h0 g2 0  0  g0 ][yl(1)]
[x(2)] = [ h1 h3 0  0  g1 g3 0  0  ][yl(2)]
[x(3)] = [ h0 h2 0  0  g0 g2 0  0  ][yl(3)]
[x(4)] = [ 0  h1 h3 0  0  g1 g3 0  ][yh(0)]
[x(5)] = [ 0  h0 h2 0  0  g0 g2 0  ][yh(1)]
[x(6)] = [ 0  0  h1 h3 0  0  g1 g3 ][yh(2)]
[x(7)] = [ 0  0  h0 h2 0  0  g0 g2 ][yh(3)]

As mentioned before, we must be careful about the choice of wavelet. It is not innately true that the inverse transform of the forward transform will always yield the original data. This property is referred to as "perfect reconstruction," and it's not something you can assume about wavelets. A great deal of wavelet literature tends to deal with the mathematician's point of view rather than a computer scientist's view. In general, this means that they often deal with wavelets applied to infinite sequences of data. And many wavelets that exhibit perfect reconstruction on infinite domains won't translate over to finite domains.

So in the end, what is this good for? Well, if you've read this far, hopefully, it means you're actually interested enough to have heard something about them. The most common applications are signal processing and data compression. About the simplest thing you can do as far as compression is to set anything below a threshold to zero, and then run-length encode the zeros. Generally, when you do this it tends to be nice to choose a wavelet that produces a lot of zeros when transforming your dataset in the first place. The highpass info sections tend to have very small values in general, anyway, but it still often counts for something. Other, more complicated approaches could involve things like zerotrees and so on, but that's another tutorial.

There are, of course, other uses for wavelets such as techniques involving precomputed light transport. Performing wavelet transforms on the column or row vectors of the summation matrix can make a matrix more sparse for the sake of performance. We can often get by preserving a small fraction of coefficients when doing so.

Hopefully, the DWT doesn't appear so scary. Make no mistake, some of the pure theoretical content concerning wavelets is incredibly scary to newcomers. It gets to the point that a lot of professors don't enjoy teaching it even if they have personal interest in the matter. But for the programmer in us, we can at least look upon it from a programmer's point of view and play around in that world.

Good luck, and happy coding
- Parashar K.

*CDF is a biorthogonal family of wavelets. CDF stands for Cohen-Daubechies-Feauveau. The CDF(5,3) AKA CDF(2,2) filter is the same one used in the JPEG2000 spec.

References :

http://www.bearcave.com/misl/misl_tech/wavelets/lifting/  -- Wavelet Lifting Scheme.  An alternate method for constructing and working with wavelets.
http://dmsun4.bath.ac.uk/resource/warehouse.htm -- The Bath Wavelet Warehouse, a repository full of wavelet filter bank coefficients.
http://www-dsp.rice.edu/software/ -- DSP-related software at Rice University.  Contains several resources on wavelets as well as demonstratory Java applets.