;========================================================================= ; SFFT.TXT ; Keith Larson ; TMS320 DSP Applications ; (C) Copyright 1996,1997,1998 ; Texas Instruments Incorporated ; ; This is unsupported freeware with no implied warranties or ; liabilities. See the C3x DSK disclaimer document for details ; ; This and all other DSK applications can be downloaded from ; the Texas Instruments FTP site. ; ; Further reading and other information ; ------------------------------------- ; SFFT.TXT Details how the SFFT works ; Designer Notebook page 22 'Fast Logrithms on a Floating Point Device' ; APPHELP1.TXT and APPHELP2.TXT included with the DSK software ; ; Texas Instruments FTP site. ; ; ftp://ftp.ti.com/mirrors/tms320bbs Main FTP site ; ftp://ftp.ti.com/mirrors/tms320bbs/c3xfiles TMS320C3x code examples ; ftp://ftp.ti.com/mirrors/tms320bbs/c4xfiles TMS320C4x code examples ; ftp://ftp.ti.com/mirrors/tms320bbs/c3xdskfiles DSK file subdirectory ;========================================================================= OVERVIEW -------- SFFT.ASM uses a technique known as a 'Sliding FFT' (or SFFT), to efficiently calculate the spectrum of a signal on a sample by sample basis. This unique ability makes the SFFT particularily well suited for applications where signal analysis, filtering, modulation, demodulation, or other forms of signal manipulation in the frequency domain must be performed in real time. The SFFT algorithm is also quite similar to the DFT. The SFFT is calculated on a sample by sample basis, essentialy being the equivalent of overlapped FFT's having an overlap of 1 sample. In this case, the past frequency data is efficiently reused to calculate the frequency spectra of the next sample window by adding the frequency domain spectra of a new sample, while simultaneously subtracting the frequency domain spectra of the oldest sample. The SFFT can be expressed in very simple terms that do not require first- hand knowledge of the DFT or FFT. In addition, the SFFT can be used to derive the DFT equation, which may be useful to DSP beginners, or by DSP experts looking for a different approach to solve a problem. SFFT THEORY: A BETTER WAY TO USE THE IMPULSE RESPONSE ----------------------------------------------------- The SFFT can be based on the following simple concepts 1) The property of superposition allows two or more signals to be linearly added to create a new signal. It can also be concluded that a sampled time domain signal is the summation of a series of individual input samples or impulses of varying magnitude (fig 1). Similarily, signals, or impulses, can be subtracted. If an input signal sample buffer (fig 1) of data is kept in memory, a sliding rectangular window of data samples (fig 2 and fig 4) can be constructed by adding the newest sample and subtracting the oldest sample (fig 3) from the previous original windowed signal (fig 2). The diagram below shows how the addition and subtraction of samples can 'slide' a window of data samples from those shown in figure 2 to those shown in figure 4. <- Older Newer -> Input * * signal * | * * | sample * | | | * * | | buffer * | | | | | * * | | | T <--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--> Fig 1 * * | | | | * T=0 * | | | * | * * Original * Windowed * | * * Signal | | | * * | | | | | * * | | T <--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--> Fig 2 * | | | | * * | | | * | * * new-old * sample | window Subtract old | T=N | T <--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--> Fig 3 | T=0 | Add new * Next * * Windowed | * * | Signal | | * * | | | | | * * | | | T <--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--> Fig 4 * | | | | * | | * | | | | | | | * | * | | | * | | Window is time | --->| shifted one sample --->| 2) The frequency domain response of an impulse, or single sample point where all other data points are zero, results in a flat frequency response with a magnitude in each frequency bin equal to the impulse input magnitude. Conversely, the impulse is the additive result of many sinusoidal frequency components. The time when the impulse occurs within the sample window is determined by the phase angles of the individual component frequencies. An impulses time of arrival is essentialy determined by a linear phase shift between each frequency bin. 3) In the frequency domain, the addition of frequency samples also follow the rules of superposition. Essentialy the spectra of figure 3, the new-old sample window, is added to the spectra of Figure 2, the orginal windowed signal, to create the new spectra of Figure 4. The difference is that complex data is used in the frequency domain to represent the phase information of the individual component frequencies. 4) The summation of a series of simple impulse transforms, which have correspondingly simple frequency domain transforms, will result in the composite frequency domain transform of the signal. 5) A sliding rectangular window is created by subtracting off the Nth oldest sample, which in the frequency domain will have gone through a multiple of 2*pi radian rotations. Note: In some applications complex time domain inputs may also useful. For this application, only the Real data from an ADC is used. MORE DETAIL ----------- If an impulse sample occurs at T=0 the frequency response calculation is further simplified since the response contains only REAL and no IMAG components. The transform of an impulse at T=0 is simply to store the magnitude of the impulse into each REAL bin, and zero the IMAG bin. If T!=0, the time shift creates a phase shift or complex vector rotation within each frequency bin. The phase rotation angle being proportional to the time shift and the frequency of interest. If the time shift is one sample period, as used in the SFFT, special conditions can be applied. At low frequencies, the amount of phase shift from sample to sample is low, or in the case of 0 hz, zero radians of phase. At higher frequencies the phase rotation is greatest. At the Nyquist frequency the vector rotation is PI/2 radians per sample, which corresponds to 2 samples per sine wave cycle. Vector rotation for bins between DC and the Nyquist rate are proportional to the bin frequency. A Fourier transform also produces both negative and positive frequencies, which are mirror images of each other. A practical aspect is that only positive frequencies need to be computed. This is suitable for spectrum analysis and filtering. The ranges for n and the resulting complex rotation vectors (twiddle factors) for each bin are therefore. Positive frequencies 0 <= n < N/2 Negative frequencies -N/2 <= n < 0 -j*2*pi*n/N complex(R_phase,I_phase) = exp REAL_tw[n] = cos(n*2*pi/N); IMAG_tw[n] = sin(n*2*pi/N); The basic SFFT operation is a vector rotate of each previous bin value, add the newest sample, and subtract the oldest sample. This is a very simple operation, but at the expense that all bins must be computed before the next input sample is ready. NewBinVal = (New - Old) + (OldBinval * vect_rotate) -j*2*pi*n/N Bin[n] = (Sample[0]-Sample[N-1]) + (Bin[n] * exp ) VISUALIZING THE SFFT -------------------- The easiest way to visualize the SFFT is to consider that each new sample occurs at T=0, making each new sample all REAL in the frequency domain. Then by considering that the past summation has been time shifted by one sample, a vector rotation proportional to the frequency is applied. A schematic representation for an SFFT bin is shown below. FREQUENCY BIN DIAGRAM (equivalent to an IIR filter) --------------------------------------------------- IN --+--------------*-------------------------//-------> | +--------+ | +-+ More bins +--|N delay |----|X|---*-----------------//-------> +--------+ | +-+ | | ^ | 2^N | | | +---+ K2 = K1 --------+ +->| - |<---+ | +---+ | | | | | +---+ | +--------->| + | | +---+ | | | -j*2*pi*n/N +---+ | ^ K1 * exp ---->| X | | | R,I +---+ | complex vector | | "rotation rate" +------+ Twiddle etc... | V Fbin OUT Where: Vector_rotation_rate[n-th Freq] = 2*PI * n / (N*Fs); K1 & K2 force convergence (see below) Fbin CONVERGENCE & STABILITY ---------------------------- One aspect of the SFFT that must be considered is that there is a feedback loop which will affect the stability of the bin values. This is similar to an IIR filter where in the Z domain, a pole sites on the unit circle. To maintain stability, and keep the bin values from growing out of control, the magnitude of the complex vector rotation twiddles need to be set to slightly less than one, placing the pole inside the unit circle. This causes the impulse energy magnitude in each bin to exponentialy decay towards zero. The effect of adding a stability factor is that by Nth bin rotation an impulse will have decayed to K1^N of its original magnitude. To exactly subtract the Nth oldest sample, the Nth oldest sample is scaled by a second coefficient K2=K1^N. A side effect of the exponential decay is that the SFFT is now windowed by an exponentialy decaying window. To minimize this effect it is important to keep K1 close to 1.000. 0.999 for example. SFFT WINDOWING -------------- Unlike the FFT and DFT, SFFT windowing cannot be performed in the time domain since the input window is moving in time, and therefore so must the window function. Instead the SFFT windowing operation is performed in the frequency domain using a technique known as convolution. The desirable effect of windowing is a multiplicative process in the time domain whereby the sharp discontinuities at the endpoints that accompany a rectangular data window are smoothed out. Without a smoothing window these abrupt changes smear the frequency spectrum over many bins. In the frequency domain, the coefficients of most windowing functions are simple and do not require large storage arrays. For the raised cosine window function, the coefficients are particularily simple (-.5,+1.0,-.5) and are easily imedded into the code as addition and subtraction. However, frequency domain or 'convolutional window filtering' is applied to the REAL and IMAG data seperately before the REAL/IMAG data is combined into a magnitude. Generaly this is not a not a problem since the operation is fast and only occurs during output. Another benefit is that other window functions are rapidly and easily implimented by picking different convolution coefficients. RAISED COSINE WINDOW -------------------- Time domain Frequency domain ----------- ---------------- : * : : * * : +1.0 : * * : ^ :- - - -*- - - - - - - -*- - - -: | : * * : | : * * : | *-------------------------------*--> <--+---0-0-0-+-+-+-0-0-0----+--> 0 N -N/2 | | N/2 V V W[n]=1-cos(2*PI*n/N)/2 -.5 -.5 USING SFFT.ASM FOR SPECTRUM ANALYSIS ------------------------------------ If the 'SPECT_EN' variable is set to 1 (TRUE), this will configure the DSK analog output to be the computed spectrum of the analog input beginning at BIN_START and ending at BIN_END. The output is then viewed using an oscilloscope which is triggered on a positive synch pulse. The DAC output voltage is proportional to the log magnitude of each frequency bin. To help pass impulses with minimal magnitude errors, each DAC output sample can be repeated up to DAC_RPT times. Also, the AIC TA register value can be programed to have a very high passband. This will increase the DAC output distortion if used for audio, but is acceptable for visual purposes. Also, the BIN_START and BIN_END values do not need to begin at zero or end at SFFTSIZE/2. This can be used to show that the frequency bins repeat in the frequency domain as predicted by the discrete fourier transform. The only restrictions are how much memory and CPU processing power is available. USING SFFT.ASM FOR HILBERT TRANSFORMS & ARBITRARY PHASE ANGLES FILTERS ---------------------------------------------------------------------- If 'SPECT_EN' is set to 0, the output will be configured to be the summation of the reconstructed REAL and IMAG components. An arbitrary output phase angle is implimented by performing a complex multiplication of the REAL and IMAG components by a complex vector determined by the parameter 'ANGLE'. If ANGLE=90 degrees, the Hilbert transform is reconstructed from the pass band SFFT bins covering BIN_START to BIN_END. If ANGLE=0.0, no phase shift occurs. The 0 degree and matched 90 degree phase shift Hilbert Transform are particularily useful in telecommunications applications where the quadrature outputs can be used to shift the spectrum of a signal, or for various radio and modem modulation schemes. RAISED COSINE WINDOWED FILTERS ------------------------------ By applying the raised cosine window to the summation of bin values, the REAL or IMAG filter response ripple is improved. The method implemented uses a series of coefficients that are applied to each frequency bin and then added much like an FIR filter except in the frequency domain. The coefficient values result from a) the convolution of the response of a raised cosine function with the signal response and b) the multiplication of a rectangular bandpass filter, also applied in the frequency domain. A group delay, or time shift, will also be seen which is equal to N/2 plus the time it takes a signal to make it through the ADC/DAC conversion process. In the examples shown below, it can be seen that for a given passband bandwidth, the number of bins required is actualy 'WIDTH+2' and that the sign of the coefficients alternates +,-,+,-... The endpoints, which are also scaled by 50%, are the result of the window coefficients and essentialy define the edge characteristics of the filter. 1.0 ^ | | | Length = 1 bin, raised cosine window function <-+-+-+-> | | V V -.5 -.5 1.0 ^ | 0.5 | ^ | | Length = 2 bins <-+-+-+-+-> | | V | -.5 | V -1.0 1.0 1.0 ^ ^ | | | | | | Length = 3 bins <-+-+-+-+-+-> | | | V | V -.5 | -.5 V -1.0 1.0 1.0 ^ ^ | | 0.5 | | ^ | | | Length = 4 bins <-+-+-+-+-+-+-> | | | V | | -.5 | | V V -1.0 -1.0 NON WINDOWED SFFT ----------------- A special case that may be of importance is when the SFFT is used to compute the all pass 0' and 90' Hilbert transforms of a non-windowed but synchronized signal. Note that frequency bin spreading will occur if the signal is not harmonicaly related to the sample window. For REAL summations, the input is exactly reconstructed by scaling the 0-th or DC bin by 50%. This scaling compensates for a 2:1 rise in signal level since all bin data energy, except for the 0th bin, is split equaly between the positive and negative frequencies. At the 0th bin, there is no IMAG information since no phase shift is applied to that bin. A DC component for an IMAG reconstruction therefore does not exist. N/2 SFFT R/I bins +----+----+----+----+----+----+ | | | | | |N | IN ---->----| 0 | 1 | 2 | .. | .. |- -1| | | | | | |2 | +----+----+----+----+----+----+ | | | | | | | | | | | | +-+| +-+| +-+| +-+| +-+| +-+| |+|--|+|--|+|--|+|--|+|--|+|----->>---- REALSUM +-+| +-+| +-+| +-+| +-+| +-+| | | | | | | +-+ +-+ +-+ +-+ +-+ +-+ |+|--|+|--|+|--|+|--|+|--|+|--->>---- IMAGSUM +-+ +-+ +-+ +-+ +-+ +-+ : : -->: :<-- 0-th bin : : PERFORMANCE ----------- Since the SFFT needs only to compute the 'bins of interest' within the span of one time sample, narrow band analysis or filtering is very efficient, even when the effective FFT size is very large. However, large numbers of bins and or high sampling rates may become impractical for a single processor. In which case a traditional block style FFT or filter may be more practical. For example, in a filter application, only a few frequency bins may be required, with the unused bins effectively being zero since they are not needed for reconstuction. The maximum sampling rate or the number of bins that can be calculated, is as shown. Ts(min) = (SFFT_cycles_per_bin * bins + loop_overhead) * nS/cycle Ts(min) = (7 * N/2 + 52) 40 nS NOTE: The loop overhead value is the time consumed by interrupt routines, data formating, input and output. SFFT.ASM is not highly optimized since it is for educational purposes. Immediate optimization gains can be achieved by inlining the three major functions 'Input', 'SFFT', and 'Output', removing 3 calls and 3 returns, or 24 cycles, from the loop overhead. LOOP UNROLLING FOR HIGH SPEED FILTERING --------------------------------------- The inner loop of the SFFT consumes 5 computational cycles but executes in 6 cycles. The conflict occurs from a data bus bandwidth limitation and results from the STF||STF operation immediately preceding a double load of data for the MPYF3 instruction. This 'null' cycle can be effectively filled by moving the filter summations within the loop. This is essentially 'free' since the summation can be done entirely within registers, requiring no data path access. The +1,-1 convolutional filter coefficients for raised cosine windowing can be hard coded within the loop by performing subtractions that invert the sum each time it goes through the loop. This avoids fetching coefficients from the data bus. Overall, the forward and reverse SFFT are computed at 6-7 cycles per bin depending on if both REAL and IMAG outputs are required. The general case 'educational' example SFFT.ASM is slightly slower while SFFT2.ASM which is written for filtering. FITTING THE CODE AND DATA INTO MEMORY ------------------------------------- If the effective desired SFFT/FFT size is 512 points, then only 256 positive frequencies need to be computed. With R/I twiddle and R/I SFFT data associated with each bin, 1024 words of memory will be required. In addition, 512 words of input buffer data will be needed. To maximize speed, the inner loop of the SFFT uses dual access on-chip memory to access data at the rate of two data moves per CPU cycle. To avoid program fetch conflicts, the SFFT code is loaded into the second on-chip SRAM block, which also holds the data buffer. If off chip memory is available, excellent performance can be achieved by placing as much SFFT bin data on-chip as possible. The input window sample buffer and code can be external since the main code loop will easily fit inside the cache and the sample buffer is only accessed twice per SFFT cycle. Note: The SFFT only needs to calculate the difference of the input of the most recent and the oldest data sample one time. This value is reused for all bin calculations and is kept in a register If circular or bit-reversed data storage were used, this would force the data and twiddle buffers to be on 2^N word boundaries. In addition, the circular addressing registers would be consumed. Since the overhead of checking and reloading of the buffer pointers is minimal, and allows non 2^N sizes, explicit pointer testing is used in SFFT.ASM. USING THIS CODE WITH 'C' ------------------------ If you want to use the functions in this code with a high level language such as 'C', you will need to perform context save and restore operations at the beginning and end of each function. TLC32040 ADC & DAC CONSIDERATIONS --------------------------------- The application file SFFT.ASM is written to use a TLC32040 Analog Interface Chip (AIC) connected as used in a TMS320C31 DSP Starter Kit or DSK (TMDS3200031). Further documentation for the DSK is either available in the DSK kit, or by download from the Texas Instruments FTP site at ftp://ftp.ti.com/mirrors/tms320bbs Main TMS320 FTP mirror site ftp://ftp.ti.com/mirrors/tms320bbs/c3xdskfiles C3x DSK files subdirectory SFFT SUMMARY ------------ - A time signal is comprised of a series of samples. - Each sample is an impulse. - The time signal is a time summation of a series of impulses. - The frequency spectra of a single impulse at T=0 is trivial to calculate. Essentialy being only a REAL component in each frequency bin whose magnitude is that of the impulse - The frequency spectra of a signal is the summation of the individual impulse responses. - A shift in time is essentialy a shift in phase or 'phase rotate' in the frequency domain - It is easiest to consider each new impulse as occuring at T=0, and performing the time shift on the past summation of samples as a whole. - At each bin the amount of 'phase rotation' or 'twiddle factor' that is applied to each bin is proportional to the frequency of the bin. At DC (n=0) the phase shift is zero and at Fnyq (n=N/2), the phase shift is pi radians. - After phase rotating each bin, simply add the new sample/impulse value. (Dont forget to start with each bin magnitude as zero) - At this point the fourier transform is a forever expanding series in both the time and frequency domains. - The Nth oldest sample is rotated n multiples of 2*pi radians, making the Nth oldest sample completely REAL with no IMAG component. At N samples of age, phase rotation = N*(n*2*pi/N)=n*2*pi - A 'sliding' rectangular window is created by subtracting the T=Nth oldest sample while adding the newest T=0 sample. Essentialy at T=N each frequency bin has rotated N times and is back to 0 radians of phase and can be properly subtracted. CONCLUSION ---------- SFFT.ASM is written for the DSP beginner, but is packed with features that also make it useful to the experienced DSP veteran. SFFT.ASM implements a continuous time Fourier transform which can be used to construct filters, analyze spectra, or used as a general-purpose DSP teaching platform.