The Z transform and the Discrete Fourier transform

The Z transform is the workhorse and the backbone of discrete signal procesing. If you are working with discrete data (and one usually is), and are trying to perform a spectral analysis, the ZT is usually what you will get (often no matter what you want). In a fairly real sense, unless you are working with optical systems (or radar, radio astronomy, etc..., where a physical FT surface may exist) none of the 3 other transforms discussed above exist outside the heads of mathematicians. Even if you try to compute one of the other continuous transforms (such as the Laplace transform) from discrete data on a digital computer, you will almost inevitably get a distorted version of the ZT.

The ZT has the simplest formula of the 4 transforms discussed here. Usually, this is given by:

Z(x(t)) = Z(z) = a sum(x(t) z^(-t)).

where a is some constant, z is any complex value in the plane, and x(t) is the input for t = 0 to N-1. The ZT is often thought of as the discrete version of the Laplace transform. However, this viewpoint is incorrect, as there are some significant topological differences between the two. However, like the LT, the ZT has as its domain the entire complex plane, and produces a complex result at each point which can be visualized either as two surfaces (real and imaginary) or as a 2d vector field. Also like the LT, the ZT is thought to completely characterize the properties of a discrete system or signal, usually by the appearance of zeros and poles, and is accompanied by a corresponding region of convergence.

One advantage of the ZT is that many of its properties and results can be visualized and understood geometrically. Once a geometric viewpoint is adopted for the ZT, many of its subtle mysteries are found to be fairly simple to understand. I have not seen this approach taken in books, although I have not read all signal processing books currently available. In the explanations that follow, I will restrict the input dataset x(t) to be real, bounded, and random, although it can in general be complex. Physical signals will also be band-limited, although that is not a necessary property for this discussion.

The ZT consists of the scaled inner product of two vectors. One vector is the input dataset, x(t): {x(0), x(1), x(2),..., x(N-1)}. The other vector consists of the sequence of non-positive powers of an arbitrary complex number in the plane, z^(-t): {1, 1/z, 1/z^2,..., 1/z^(N-1)}. Note that all z sequences defined this way start at 1. I will also choose the scaling constant a to be 1/N, in order to perform an average of the complex terms produced by the inner product.

To investigate the geometric properties of the ZT, I have created an example which computes one coefficient of the ZT for a specific value of z near 1: (Click for larger image)

The value of z can be selected by setting the magnitude and phase with a set of sliders. The value of Z(z) is also in polar coordinates: (Click for larger image)

To visualize the z^(-t) sequence, and the inner product, it is helpful to divide the complex plane into three regions:

1. Points on the unit circle.
2. Points within the unit circle.
3. Points outside of the unit circle.
First consider those points on the unit circle. These points are restricted by the relation z = exp(iW), where W is the angle made with the positive real axis. These points are also the domain of the discrete Fourier transform. In this case, the ZT becomes Z(z) = a sum(x(t) exp(-iWt)), which is also the formula for the DFT. Thus, the DFT is a subset of the ZT.

Consider the point z = 1, which occurs for W = 0. The sequence {1, 1/z, 1/z^2,..., 1/z^(N-1)} is then simply {1, 1, 1,...}, so that the inner product is x(0) + x(1) + x(2) +...+ x(N-1). The value of the ZT for z = 1 is 1/N sum(x(t)), or the average of x(t). Each of the terms of the inner product is a simple product x(t) * 1, which is a 2d point on the real axis. The average of these points is the value of the ZT for that value of z (for W = 0). (Click for larger image)

Next, consider a point z = exp(iW) where W is a small positive value. This point is located on the unit circle slightly above and counter-clockwise of 1. All elements of the sequence {1, 1/z, 1/z^2,..., 1/z^(N-1)} are on the unit circle. The first point 1 is located at 0 radians. The second point 1/z is the inverse of z. The inverse of z = a + bi is (a - bi) / (a^2 + b^2) for z != 0. This is a point which has a magnitude which is the inverse of the magnitude of z, and an angle which is the opposite of z. In the case of z on the upper half of the unit circle, 1/z is on the lower half at the negative angular coordinate.

The remaining points of the series, integer powers of 1/z, have the same magnitude (i.e. are on the unit circle), but have angles which are multiples of 1/z's. For z slightly above 1, 1/z is slightly below 1. Therefore 1/z^2 has the same magnitude, and is slightly below 1/z by the same angle. The same is true for 1/z^3, 1/z^4, etc...

The sequence {1, 1/z, 1/z^2,..., 1/z^(N-1)} therefore consists of a set of points on the unit circle which are spaced at W radians clockwise of 1. If z were slightly below 1, this sequence would be counter-clockwise of 1. Because the input x(t) is finite, this set of points is finite, and may only go a short angular distance from 1. (Click for larger image)

As W is increased, the angular separation between points of z^(-t) increases. When W reaches 2 pi / N, the set of points will completely encircle the origin. (Click for larger image)

As W increases further, the set of points z^(-t) will go around the origin more than once. Eventually, W will reach pi, in which the elements of the sequence are {1, -1, 1, -1,...}. Usually, the limits of W for the DFT are +-pi. Note that the points W = -pi and W = +pi must generate the same DFT values, since they both generate the same z^(-t) sequences {1, -1, 1, -1,...). Since the imaginary part of the spectrum of a real function is odd, it must be equal to zero at z = -1. This result is consistant with the center of the set of points which are the result of multiplying x(t) by (-1)^(-t), which is on the real axis.

Consider next the points z within the unit circle. The sequence {1, 1/z,...} again has 1 as its first element. The next element is 1/z, which is a point outside the unit circle having magnitude = 1/mag(z) and angle = -ang(z). Multiples of 1/z have magnitudes = 1/mag(z)^t and angles = -t ang(z). Thus, the z^(-t) sequence for a point inside the unit circle is a set of points starting at 1 and increasing in magnitude in a geometric spiral away from 1. Note that these points do not lie on a continuous curve as does the result of the DFT; they are simply discrete points for each value of z^(-t). Each point of the sequence is on two sets of continuous curves for different starting values of z, however: one set for a continuous change in W, the other for a continuous change in R: (Click for larger image)

Finally, consider the points z outside of the unit circle. The z^(-t) sequence then starts at 1 and spirals into the unit circle. Each multiple of 1/z moves around the origin by an angle of -ang(z), and decreases in magnitude by 1/mag(z). (Click for larger image)

Note that to compute the ZT of a point z inside the unit circle, the values of x(t) are multplied by a set of points with increasing magnitudes. Therefore it would be reasonable to expect that some datasets would generate unbounded sums in this case. That is indeed what happens for many discrete systems: their poles are located inside the unit circle. However, this is not always the case.

Unlike the LT, the ZT performs only finite sums, therefore it does not actually produce poles in the sense of points with infinite magnitudes, except at z = 0. However, the magnitude of the ZT can be monitored so that "poles" of sufficient magnitude are recognizable from other values. The same can be done to find zeros of the ZT. A consequence of this is that the ZT for points z within the unit circle tends to wieght the transform toward values at the end of x(t), since the magnitudes of z^(-t) are greatest there. Conversely, the ZT of points z outside of the unit circle tends to weight the transform toward values near the beginning of x(t), since the magnitudes of z^(-t) are greatest there. Although this "feature" is not designed into the ZT, it does obviate some criticism from other schools of analysis (e.g. wavelets) that the FT is not sensitive to the location of spectral features within the epoch of a dataset.

Given the geometric interpretation above, we are now in a position to explain some of the features of the DFT, in terms of the ZT. In the following, all points z lie on the unit circle, and the scaling constant a = 1/N.

The first observation is the shape of the spectrum of a rectangle function. Assume we are given an input x(t) consisting of {1, 1, 1,...}. For z = 1 the ZT is 1/N * N, or just 1. For a small W near 0, the sequence z^(-t) is a closely spaced set of points on a short clockwise arc of the unit circle. Each of these points is multipltied by 1. The average of these products will be near the center of the arc, just inside the unit circle. Thus, the magnitude is slightly less than 1, while the phase is the same as the center of the arc. As W increases, the arc of points of z^(-t) will go farther around the unit circle, by a total of WN radians. For W = pi/(2N), the points go 1/4 of the way around, so that the averge of their product with x(t) has a phase of pi/4 and a magnitude less than 1 (probably 0.707, or -3 db). For W = pi/N, the points go 1/2 around, so that the average has a phase of pi/2 and a smaller magnitude than for previous values of W. Once W reaches 2 pi / N, the points go the entire way around, and the average has a magnitude of 0 and a phase of pi. After that, the magnitude of the average begins to rise, but never again reaches 1. When W = 4 pi / N, the average is again 0. At each value of W = 2 pi m / N, the magnitude again becomes zero. (Click for larger image)

Thus, the magnitude of the spectrum looks like the familiar rectangle transform, alternately falling to zero at regular intervals and then rising again, but not as far as previously. Next is the observation that the spectrum of a real function is Hermitian. To see this, we will compare values of the ZT/DFT at the values +W and -W. For +W, the sequence z^(-t) is again a set of points on a clockwise arc of the unit circle. Each of these points is multiplied by the corresponding value of x(t). Multiplication of a complex number by a real value simply changes the magnitude, but not the phase, of the point. Let the real component of the average of these products be r. For -W, the sequence z^(-t) is a set of points on a counter-clockwise arc of unit circle which are a mirror reflection of the points for +W in the real axis. These points are then multiplied by x(t), but in the opposite direction, creating a set of products which is a reflection of the products for +W. Since they all have the same real components as the previous products, their real average is again r. This is true for all values of +-W. Thus, the real component of the spectrum is symmetric. Let the imaginary component of the average of the products for +W be j. Because the products for -W are a reflection of those for +W, the imaginary component of their average is -j. Thus, the imaginary component of the spectrum is antisymmetric. When we add the known constraint that the ZT/DFT at W = pi is the same as for W = -pi, this implies that the imaginary component of the spectrum must be zero at z = -1 (which can also be seen from writing out the inner product z^(-t) * x(t) at -1). (Click for larger image)

If one allows t to take on negative values, the spectrum of a symmetric real function is completely real (has zero imaginary component everywhere). Consider a real function for t = -T to +T, where the function is symmetric about t = 0. In this case, because we allow t to take on negative values, the sequence z^(-t) is now {.., z^3, z^2, z, 1, 1/z, 1/z^2, 1/z^3,..}, which includes positive powers of z. Thus, for z on the unit circle, the set of points z^(-t) for a given W goes both clockwise and counter-clockwise away from 1 (but only half as far each way). Since x(t) is symmetric about t = 0, the factors applied to the counter-clockwise arc are the same as those applied to the clockwise arc. Therefore, the average of the products must lie on the real axis, and the imaginary component of the spectrum is everywhere 0.

The Discrete Fourier Transform

The discrete Fourier transform is the discrete version of the Fourier transform. This does not mean that it is a discretization of the continuous transform result, as is sometimes mistaken, but that it is a Fourier-type transform applied to discrete data. The DFT is sufficiently different from the FT, however, to warrant careful study, especially since it (or a varient of it) is usually the primary spectral method used to analyze new data.

Another potential misconception is that discrete data means digital data, which is only sometimes the case. Discretization means taking any continuous signal and turning it into a set of values distributed discretely in time (usually at "equal" intervals). Each of these elements may be a real number which can take on any continuous value, or their values may also be limited to a discrete set, as is the case for digital data. For "discrete" data one may usually substitute "measured" data, as the measurement process of many experiments is sufficient to discretize the data in time. Therefore, in the analysis of scientific data, one is almost always dealing with discrete data, and should be aware of the issues concerning the DFT.

The DFT is not the FFT. The FFT is a specialized variant of the DFT. Unlike the DFT, the FFT does not consist of a single formula, but of a set of similar functions each of which is said to produce "a" version of the FFT. An FFT is extremely useful in some applications, but it is not the main spectral routine I will use in most examples, and should probably not be the primary spectral routine applied to new data. It is possible (and perhaps common) to use an FFT without really knowing what you are getting, or how. While I am willing to forego intimate knowledge of how the square root function of my calculator works (since all calculators should produce the same result), I prefer to have knowledge and control over the workings and results of a DFT.

Here are some comments on the practical workings of the DFT, and how to interpret some of its results:

• The formula I use for the DFT is:

F(x(t)) = F(W) = a sum(x(t) exp(-iWt)).

where a is an arbitrary constant (usually 1, 1/N, or some multiple/power of pi), and W is an angular variable which can range between -pi and pi, or between the limits of any other 2 pi radian "branch" of the complex plane. x(t) is the input dataset, for t = 0 to N-1. For real functions, W is usually taken from 0 to pi, since the spectrum of a real function is Hermitian. I use W instead of w (actually they should be Omega and omega) to distinguish the DFT from the FT.

• DFT values for W = pi are the same as for W = -pi (or for the ends of any 2 pi range of W).

• Frequency is not explicitly part of the DFT. For data which has been acquired from physical measurements, a W value of -pi or pi represents the Nyquist frequency of the signal (1/2 the sampling rate). For synthetic data, there is really no concept of frequency apart from W.

• The result of the DFT of a discrete dataset is (in concept) continuous, although it will consist of a finite set of values. These values all lie on a continuous trace which represents the spectrum. A coefficient of the DFT can be computed for any continuous value of W (within a 2 pi range). Computing additional interior points of the DFT will produce more resolution in the spectrum.

• Spectra computed from different datasets, possibly of different lengths, possibly the result of data acquisition at different sampling rates, can be compared point by point at equal values of W. Therefore, spectra which have been computed for the same W0 to WN range (even if the resolution of W within that interval is different), may be displayed on similar axes. Actual points may not coincide due to differences in resolution and offset of W, but the spectra can be compared interval by interval, either visually or by interpolation. Note that the values of W do not need to be evenly spaced. Although equal W values can be compared, the Nyquist frequencies of different spectra may be different, due to different sampling rates. Thus, the spectra cannot be compared frequency by frequency without scaling the W axis of at least one.

• The spectrum of a discrete dataset is periodic. This is substantially different from the FT, which creates a single spectrum for w = -infinity to +infinity. The DFT creates multiple copies of the spectrum for W = -infinity to +infinity. The reason for this periodicity can be thought of in two equivalent ways:

1. The FT of a discrete dataset is actually the FT of a continuous function which has been multiplied by a set of pulses (to yield the discrete values). Therefore the overall spectrum is the FT of the signal convolved with the spectrum of the discretizing pulses (by the modulation theorem). The FT of a set of pulses is another set of pulses. If the discretizing pulses are made closer together in time, the set of transformed pulses get farther apart (and vice versa). The convolution of the signal's spectrum with any one of the transformed pulses creates a copy of the signal spectrum, but shifted along the frequency axis. The convolution of the spectrum with all the transformed pulses creates multiple copies of the spectrum at different points on the frequency axis. The overall spectrum is the summation of all such copies. If the discretizing pulses are close enough together, and the signal is sufficiently band-limited, the individual copies of the transformed spectrum will be far enough apart not to overlap. If the discretizing pulses are too far apart, the copies of the spectrum will overlap and distort the shape of the result (aliasing).

2. The DFT is actually the Z transform computed on the unit circle of the complex plane. In this case, z = exp(iW), which is a point on the unit circle at an angle of W radians. The DFT of a real function is usually computed as the Z transform of points on the upper half of the circle (for W = 0 to pi), while the DFT of a complex function is usually computed for points on the entire circle. There is really only one copy of the spectrum, but it can be obtained over and over again by computing values of W past -pi or pi (or outside any other 2 pi range). As W goes around and around the unit circle, the same value of the Z transform is returned as the value of the DFT for different values of W. If the signal is sufficiently band-limited, its spectrum will decline to zero before W reaches -pi or pi, and thus the "copies" of the spectrum will not overlap. However, if the spectrum extends beyond -pi or pi, it can overlap the spectrum coming from the other direction, causing alaising and distortion.

• One consequence of the periodicity of the spectra of discrete functions is that all acquired signals must be band- or low-pass filtered in the external electronics, before they are discretized. Once the signal is discretized, it's too late. Any frequency components outside the Nyquist limit will alias the spectrum, and there is little way to remove the distortion. Also, the individual stages of hybrid analog/discrete electronics (such as the use of switched capacitor active filters) must be isolated with band- or low-pass filters, since the discretization process requires that potentially aliasing components be removed prior to filtering, as well as creating spurious copies of the spectrum in the output. Presumably the stages of (purely software) digital filters must also be checked to ensure that they maintain the required frequency-limitation on output signals, since randomly generated (or chewed up) data is not limited to spectra which go to zero as W nears -pi and pi.

• There are some subtleties associated with the two interpretations of the DFT given above:

• To consider the FT/convolution argument, both the signal and the set of discretizing pulses should be infinite in duration, since the limits of integration of the FT are from -infinity to +infinity.
• Data are actually acquired by using a finite set of discretizing pulses. Therefore, convolution with this set of pulses should only produce a finite set of copies of the spectrum.
• Considering the DFT as a subset of the ZT, the circular nature of W implies that there should be an infinite number of copies of the spectrum. However, this prediction conflicts with what has just been said about the finite nature of the discretizing process.
• If the spectrum is either infinite or limited to a finite set of copies, this still implies that there is considerable power outside the actual band of the signal, potentially in high frequencies. What limits this power in multiple high-frequency copies? The standard answer of some kind of assumed band-limiting in other parts of the system is inacceptable, although it is obvious that something does indeed limit the power distribution. I compare this question to the Obler paradox of astronomy (call it the "Coyote paradox").

• The result of the DFT can be visualized in several ways. Usually, it is displayed as 2 1d curves showing either the real and imaginary components of each coefficient, or the magnitude and phase of each coefficient. The DFT may also be displayed as a set of 2d points in the plane. In this case, the spectrum will be seen to consist of a single 2d trace which winds around and around the origin. Each coefficient of the spectrum lies on this trace. Computing coefficients at a higher W resolution will make the continuous nature of the trace more apparent. This trace can also be viewed in 3d, in cylindrical coordinates, in which it appears to be a helix-like curve winding around the W axis. The proper framework for viewing the results of the DFT, however, consists of 3d toroidal coordinates. In this case, the DFT is shown as a curve which winds around and around the unit circle in the XY plane. The Z axis is the real component of the spectrum, while the radius - 1 in the XY plane is the imaginary component (or vice-versa). Sliders can be used to vary the relative scales of each of these quantities. (Click for larger image)