;=========================================================================
; SFFT.TXT
; Keith Larson
; TMS320 DSP Applications
; (C) Copyright 1996,1997,1998
; Texas Instruments Incorporated
;
; This is unsupported freeware with no implied warranties or
; liabilities.  See the C3x DSK disclaimer document for details
;
; This and all other DSK applications can be downloaded from
; the Texas Instruments FTP site.
;
; Further reading and other information
; -------------------------------------
; SFFT.TXT Details how the SFFT works
; Designer Notebook page 22 'Fast Logrithms on a Floating Point Device'
; APPHELP1.TXT and APPHELP2.TXT included with the DSK software
;
; Texas Instruments FTP site.
;
;   ftp://ftp.ti.com/mirrors/tms320bbs              Main FTP site
;   ftp://ftp.ti.com/mirrors/tms320bbs/c3xfiles     TMS320C3x code examples
;   ftp://ftp.ti.com/mirrors/tms320bbs/c4xfiles     TMS320C4x code examples
;   ftp://ftp.ti.com/mirrors/tms320bbs/c3xdskfiles  DSK file subdirectory
;=========================================================================

OVERVIEW
--------
SFFT.ASM uses a technique known as a 'Sliding FFT' (or SFFT), to efficiently
calculate the spectrum of a signal on a sample by sample basis.  This unique
ability makes the SFFT particularily well suited for applications where
signal analysis, filtering, modulation, demodulation, or other forms of
signal manipulation in the frequency domain must be performed in real time.
The SFFT algorithm is also quite similar to the DFT.

The SFFT is calculated on a sample by sample basis, essentialy being the
equivalent of overlapped FFT's having an overlap of 1 sample.  In this
case, the past frequency data is efficiently reused to calculate the
frequency spectra of the next sample window by adding the frequency domain
spectra of a new sample, while simultaneously subtracting the frequency
domain spectra of the oldest sample.

The SFFT can be expressed in very simple terms that do not require first-
hand knowledge of the DFT or FFT.  In addition, the SFFT can be used to
derive the DFT equation, which may be useful to DSP beginners, or by DSP
experts looking for a different approach to solve a problem.

SFFT THEORY: A BETTER WAY TO USE THE IMPULSE RESPONSE
-----------------------------------------------------
The SFFT can be based on the following simple concepts

 1) The property of superposition allows two or more signals to be linearly
    added to create a new signal.  It can also be concluded that a sampled
    time domain signal is the summation of a series of individual input
    samples or impulses of varying magnitude (fig 1).  Similarily, signals,
    or impulses, can be subtracted.

    If an input signal sample buffer (fig 1) of data is kept in memory, a
    sliding rectangular window of data samples (fig 2 and fig 4) can be
    constructed by adding the newest sample and subtracting the oldest
    sample (fig 3) from the previous original windowed signal (fig 2).

    The diagram below shows how the addition and subtraction of samples
    can 'slide' a window of data samples from those shown in figure 2 to
    those shown in figure 4.

       <- Older                        Newer ->

 Input          *                         *
 signal       * | *                     * |
 sample     * | | | *                 * | |
 buffer   * | | | | | *             * | | |
   T <--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-->  Fig 1
        *               * | | | | *      T=0
                          * | | |
                            * | *
                              *
 Original       *
 Windowed     * | *                     *
 Signal       | | | *                 * |
              | | | | *             * | |
   T <--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-->  Fig 2
                        * | | | | *
                          * | | |
                            * | *
                              *
 new-old                                  *
 sample                                   |
 window      Subtract old                 |
             T=N                          |
   T <--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--> Fig 3
              |                          T=0
              |                          Add new
              *

 Next           *                         *
 Windowed       | *                     * |
 Signal         | | *                 * | |
                | | | *             * | | |
   T <--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--> Fig 4
                        * | | | | *
              | |         * | | |       | |
              | |           * | *       | |
                |             *           |
                |  Window is time         |
            --->|  shifted one sample --->|



 2) The frequency domain response of an impulse, or single sample point
    where all other data points are zero, results in a flat frequency
    response with a magnitude in each frequency bin equal to the
    impulse input magnitude.

    Conversely, the impulse is the additive result of many sinusoidal
    frequency components.  The time when the impulse occurs within the
    sample window is determined by the phase angles of the individual
    component frequencies.  An impulses time of arrival is essentialy
    determined by a linear phase shift between each frequency bin.

 3) In the frequency domain, the addition of frequency samples also
    follow the rules of superposition.

    Essentialy the spectra of figure 3, the new-old sample window, is
    added to the spectra of Figure 2, the orginal windowed signal, to
    create the new spectra of Figure 4.  The difference is that complex
    data is used in the frequency domain to represent the phase information
    of the individual component frequencies.

 4) The summation of a series of simple impulse transforms, which have
    correspondingly simple frequency domain transforms, will result in
    the composite frequency domain transform of the signal.

 5) A sliding rectangular window is created by subtracting off the Nth
    oldest sample, which in the frequency domain will have gone through
    a multiple of 2*pi radian rotations.

    Note: In some applications complex time domain inputs may also useful.
          For this application, only the Real data from an ADC is used.


MORE DETAIL
-----------
If an impulse sample occurs at T=0 the frequency response calculation is
further simplified since the response contains only REAL and no IMAG
components.  The transform of an impulse at T=0 is simply to store the
magnitude of the impulse into each REAL bin, and zero the IMAG bin.

If T!=0, the time shift creates a phase shift or complex vector rotation
within each frequency bin.  The phase rotation angle being proportional to
the time shift and the frequency of interest.

  If the time shift is one sample period, as used in the SFFT, special
conditions can be applied.  At low frequencies, the amount of phase shift
from sample to sample is low, or in the case of 0 hz, zero radians of
phase.  At higher frequencies the phase rotation is greatest.  At the
Nyquist frequency the vector rotation is PI/2 radians per sample, which
corresponds to 2 samples per sine wave cycle.  Vector rotation for bins
between DC and the Nyquist rate are proportional to the bin frequency.

  A Fourier transform also produces both negative and positive frequencies,
which are mirror images of each other.  A practical aspect is that only
positive frequencies need to be computed.  This is suitable for spectrum
analysis and filtering.  The ranges for n and the resulting complex
rotation vectors (twiddle factors) for each bin are therefore.

    Positive frequencies      0   <= n < N/2
    Negative frequencies     -N/2 <= n < 0

                                 -j*2*pi*n/N
    complex(R_phase,I_phase) = exp

    REAL_tw[n] = cos(n*2*pi/N);
    IMAG_tw[n] = sin(n*2*pi/N);

  The basic SFFT operation is a vector rotate of each previous bin value,
add the newest sample, and subtract the oldest sample.  This is a very
simple operation, but at the expense that all bins must be computed
before the next input sample is ready.


   NewBinVal = (New      -        Old) + (OldBinval * vect_rotate)

                                                    -j*2*pi*n/N
   Bin[n]    = (Sample[0]-Sample[N-1]) + (Bin[n] * exp           )


VISUALIZING THE SFFT
--------------------
The easiest way to visualize the SFFT is to consider that each new sample
occurs at T=0, making each new sample all REAL in the frequency domain.
Then by considering that the past summation has been time shifted by one
sample, a vector rotation proportional to the frequency is applied.  A
schematic representation for an SFFT bin is shown below.

  FREQUENCY BIN DIAGRAM (equivalent to an IIR filter)
  ---------------------------------------------------


   IN --+--------------*-------------------------//------->
        |  +--------+  | +-+                More bins
        +--|N delay |----|X|---*-----------------//------->
           +--------+  | +-+   |
                       |  ^    |
                 2^N   |  |    |  +---+
	  K2 = K1 --------+    +->| - |<---+
                       |          +---+    |
                       |            |      |
                       |          +---+    |
                       +--------->| + |    |
                                  +---+    |
                                    |      |
                  -j*2*pi*n/N     +---+    |  ^
           K1 * exp          ---->| X |    |  | R,I
                                  +---+    |
                  complex vector    |      |
                  "rotation rate"   +------+
                  Twiddle etc...    |
                                    V
                                 Fbin OUT

  Where: Vector_rotation_rate[n-th Freq] = 2*PI * n / (N*Fs);
         K1 & K2 force convergence (see below)

Fbin CONVERGENCE & STABILITY
----------------------------
One aspect of the SFFT that must be considered is that there is a feedback
loop which will affect the stability of the bin values.  This is similar to
an IIR filter where in the Z domain, a pole sites on the unit circle.  To
maintain stability, and keep the bin values from growing out of control,
the magnitude of the complex vector rotation twiddles need to be set to
slightly less than one, placing the pole inside the unit circle.   This
causes the impulse energy magnitude in each bin to exponentialy decay
towards zero.

The effect of adding a stability factor is that by Nth bin rotation an
impulse will have decayed to K1^N of its original magnitude.  To exactly
subtract the Nth oldest sample, the Nth oldest sample is scaled by a
second coefficient K2=K1^N.

A side effect of the exponential decay is that the SFFT is now windowed by
an exponentialy decaying window.  To minimize this effect it is important
to keep K1 close to 1.000.  0.999 for example.

SFFT WINDOWING
--------------
Unlike the FFT and DFT, SFFT windowing cannot be performed in the time
domain since the input window is moving in time, and therefore so must the
window function.  Instead the SFFT windowing operation is performed in the
frequency domain using a technique known as convolution.

The desirable effect of windowing is a multiplicative process in the time
domain whereby the sharp discontinuities at the endpoints that accompany a
rectangular data window are smoothed out.  Without a smoothing window these
abrupt changes smear the frequency spectrum over many bins.

In the frequency domain, the coefficients of most windowing functions are
simple and do not require large storage arrays.  For the raised cosine
window function, the coefficients are particularily simple (-.5,+1.0,-.5)
and are easily imedded into the code as addition and subtraction.

However, frequency domain or 'convolutional window filtering' is applied to
the REAL and IMAG data seperately before the REAL/IMAG data is combined
into a magnitude.  Generaly this is not a not a problem since the operation
is fast and only occurs during output.  Another benefit is that other
window functions are rapidly and easily implimented by picking different
convolution coefficients.

                        RAISED COSINE WINDOW
                        --------------------
            Time domain                     Frequency domain
            -----------                     ----------------


     :               *               :
     :           *       *           :                   +1.0
     :        *             *        :                     ^
     :- - - -*- - - - - - - -*- - - -:                     |
     :      *                 *      :                     |
     :   *                       *   :                     |
     *-------------------------------*-->   <--+---0-0-0-+-+-+-0-0-0----+-->
      0                        N             -N/2        |   |         N/2
                                                         V   V
        W[n]=1-cos(2*PI*n/N)/2                         -.5 -.5



USING SFFT.ASM FOR SPECTRUM ANALYSIS
------------------------------------
If the 'SPECT_EN' variable is set to 1 (TRUE), this will configure the DSK
analog output to be the computed spectrum of the analog input beginning at
BIN_START and ending at BIN_END.  The output is then viewed using an
oscilloscope which is triggered on a positive synch pulse.  The DAC output
voltage is proportional to the log magnitude of each frequency bin.

To help pass impulses with minimal magnitude errors, each DAC output sample
can be repeated up to DAC_RPT times.  Also, the AIC TA register value can
be programed to have a very high passband.  This will increase the DAC
output distortion if used for audio, but is acceptable for visual purposes.

Also, the BIN_START and BIN_END values do not need to begin at zero or end
at SFFTSIZE/2.  This can be used to show that the frequency bins repeat
in the frequency domain as predicted by the discrete fourier transform.
The only restrictions are how much memory and CPU processing power is
available.

USING SFFT.ASM FOR HILBERT TRANSFORMS & ARBITRARY PHASE ANGLES FILTERS
----------------------------------------------------------------------
If 'SPECT_EN' is set to 0, the output will be configured to be the summation
of the reconstructed REAL and IMAG components.

  An arbitrary output phase angle is implimented by performing a complex
multiplication of the REAL and IMAG components by a complex vector
determined by the parameter 'ANGLE'.  If ANGLE=90 degrees, the Hilbert
transform is reconstructed from the pass band SFFT bins covering BIN_START
to BIN_END.  If ANGLE=0.0, no phase shift occurs.

  The 0 degree and matched 90 degree phase shift Hilbert Transform are
particularily useful in telecommunications applications where the quadrature
outputs can be used to shift the spectrum of a signal, or for various radio
and modem modulation schemes.

RAISED COSINE WINDOWED FILTERS
------------------------------
By applying the raised cosine window to the summation of bin values, the
REAL or IMAG filter response ripple is improved.

The method implemented uses a series of coefficients that are applied to
each frequency bin and then added much like an FIR filter except in the
frequency domain.

The coefficient values result from a) the convolution of the response of a
raised cosine function with the signal response and b) the multiplication
of a rectangular bandpass filter, also applied in the frequency domain.
A group delay, or time shift, will also be seen which is equal to N/2
plus the time it takes a signal to make it through the ADC/DAC conversion
process.

  In the examples shown below, it can be seen that for a given passband
bandwidth, the number of bins required is actualy 'WIDTH+2' and that the
sign of the coefficients alternates +,-,+,-...  The endpoints, which are
also scaled by 50%, are the result of the window coefficients and essentialy
define the edge characteristics of the filter.

          1.0
           ^
           |
           |
           |              Length = 1 bin, raised cosine window function
       <-+-+-+->
         |   |
         V   V
       -.5  -.5

          1.0
           ^
           |  0.5
           |   ^
           |   |          Length = 2 bins
       <-+-+-+-+->
         |   |
         V   |
       -.5   |
             V
           -1.0

         1.0  1.0
           ^   ^
           |   |
           |   |
           |   |          Length = 3 bins
       <-+-+-+-+-+->
         |   |   |
         V   |   V
       -.5   |  -.5
             V
           -1.0

         1.0  1.0
           ^   ^
           |   |  0.5
           |   |   ^
           |   |   |      Length = 4 bins
       <-+-+-+-+-+-+->
         |   |   |
         V   |   |
       -.5   |   |
             V   V
           -1.0  -1.0

NON WINDOWED SFFT
-----------------
A special case that may be of importance is when the SFFT is used to
compute the all pass 0' and 90' Hilbert transforms of a non-windowed
but synchronized signal.  Note that frequency bin spreading will occur
if the signal is not harmonicaly related to the sample window.

For REAL summations, the input is exactly reconstructed by scaling the
0-th or DC bin by 50%.  This scaling compensates for a 2:1 rise in signal
level since all bin data energy, except for the 0th bin, is split equaly
between the positive and negative frequencies.

At the 0th bin, there is no IMAG information since no phase shift is
applied to that bin.  A DC component for an IMAG reconstruction therefore
does not exist.


                        N/2 SFFT R/I bins
               +----+----+----+----+----+----+
               |    |    |    |    |    |N   |
   IN ---->----|  0 |  1 |  2 | .. | .. |- -1|
               |    |    |    |    |    |2   |
               +----+----+----+----+----+----+
                | |  | |  | |  | |  | |  | |
               +-+| +-+| +-+| +-+| +-+| +-+|
               |+|--|+|--|+|--|+|--|+|--|+|----->>---- REALSUM
               +-+| +-+| +-+| +-+| +-+| +-+|
                  |    |    |    |    |    |
                 +-+  +-+  +-+  +-+  +-+  +-+
                 |+|--|+|--|+|--|+|--|+|--|+|--->>---- IMAGSUM
                 +-+  +-+  +-+  +-+  +-+  +-+

               :   :
            -->:   :<-- 0-th bin
               :   :


PERFORMANCE
-----------
Since the SFFT needs only to compute the 'bins of interest' within the span
of one time sample, narrow band analysis or filtering is very efficient,
even when the effective FFT size is very large.  However, large numbers of
bins and or high sampling rates may become impractical for a single
processor.  In which case a traditional block style FFT or filter may be
more practical.

For example, in a filter application, only a few frequency bins may be
required, with the unused bins effectively being zero since they are not
needed for reconstuction.  The maximum sampling rate or the number of bins
that can be calculated, is as shown.

  Ts(min) = (SFFT_cycles_per_bin * bins + loop_overhead) * nS/cycle

  Ts(min) = (7 * N/2 + 52) 40 nS

  NOTE: The loop overhead value is the time consumed by interrupt routines,
      data formating, input and output.  SFFT.ASM is not highly optimized
      since it is for educational purposes.

      Immediate optimization gains can be achieved by inlining the three
      major functions 'Input', 'SFFT', and 'Output', removing 3 calls and
      3 returns, or 24 cycles, from the loop overhead.

LOOP UNROLLING FOR HIGH SPEED FILTERING
---------------------------------------
The inner loop of the SFFT consumes 5 computational cycles but executes in
6 cycles.  The conflict occurs from a data bus bandwidth limitation and
results from the STF||STF operation immediately preceding a double load of
data for the MPYF3 instruction.

This 'null' cycle can be effectively filled by moving the filter summations
within the loop.  This is essentially 'free' since the summation can be
done entirely within registers, requiring no data path access.

The +1,-1 convolutional filter coefficients for raised cosine windowing
can be hard coded within the loop by performing subtractions that invert
the sum each time it goes through the loop.  This avoids fetching
coefficients from the data bus.

Overall, the forward and reverse SFFT are computed at 6-7 cycles per
bin depending on if both REAL and IMAG outputs are required.  The general
case 'educational' example SFFT.ASM is slightly slower while SFFT2.ASM
which is written for filtering.

FITTING THE CODE AND DATA INTO MEMORY
-------------------------------------
If the effective desired SFFT/FFT size is 512 points, then only 256
positive frequencies need to be computed.  With R/I twiddle and R/I SFFT
data associated with each bin, 1024 words of memory will be required.  In
addition, 512 words of input buffer data will be needed.

To maximize speed, the inner loop of the SFFT uses dual access on-chip
memory to access data at the rate of two data moves per CPU cycle.  To
avoid program fetch conflicts, the SFFT code is loaded into the second
on-chip SRAM block, which also holds the data buffer.

If off chip memory is available, excellent performance can be achieved by
placing as much SFFT bin data on-chip as possible.  The input window sample
buffer and code can be external since the main code loop will easily fit
inside the cache and the sample buffer is only accessed twice per SFFT
cycle.

 Note: The SFFT only needs to calculate the difference of the input
       of the most recent and the oldest data sample one time.  This
       value is reused for all bin calculations and is kept in a register

If circular or bit-reversed data storage were used, this would force the
data and twiddle buffers to be on 2^N word boundaries.  In addition, the
circular addressing registers would be consumed.  Since the overhead of
checking and reloading of the buffer pointers is minimal, and allows non
2^N sizes, explicit pointer testing is used in SFFT.ASM.

USING THIS CODE WITH 'C'
------------------------
If you want to use the functions in this code with a high level language
such as 'C', you will need to perform context save and restore operations
at the beginning and end of each function.

TLC32040 ADC & DAC CONSIDERATIONS
---------------------------------
The application file SFFT.ASM is written to use a TLC32040 Analog Interface
Chip (AIC) connected as used in a TMS320C31 DSP Starter Kit or DSK
(TMDS3200031).  Further documentation for the DSK is either available in
the DSK kit, or by download from the Texas Instruments FTP site at

ftp://ftp.ti.com/mirrors/tms320bbs              Main TMS320 FTP mirror site
ftp://ftp.ti.com/mirrors/tms320bbs/c3xdskfiles  C3x DSK files subdirectory

SFFT SUMMARY
------------

 - A time signal is comprised of a series of samples.

 - Each sample is an impulse.

 - The time signal is a time summation of a series of impulses.

 - The frequency spectra of a single impulse at T=0 is trivial to calculate.
   Essentialy being only a REAL component in each frequency bin whose
   magnitude is that of the impulse

 - The frequency spectra of a signal is the summation of the individual
   impulse responses.

 - A shift in time is essentialy a shift in phase or 'phase rotate' in
   the frequency domain

 - It is easiest to consider each new impulse as occuring at T=0, and
   performing the time shift on the past summation of samples as a whole.

 - At each bin the amount of 'phase rotation' or 'twiddle factor'  that
   is applied to each bin is proportional to the frequency of the bin.  At
   DC (n=0) the phase shift is zero and at Fnyq (n=N/2), the phase shift is
   pi radians.

 - After phase rotating each bin, simply add the new sample/impulse
   value. (Dont forget to start with each bin magnitude as zero)

 - At this point the fourier transform is a forever expanding series in
   both the time and frequency domains.

 - The Nth oldest sample is rotated n multiples of 2*pi radians, making the
   Nth oldest sample completely REAL with no IMAG component.

   At N samples of age, phase rotation = N*(n*2*pi/N)=n*2*pi

 - A 'sliding' rectangular window is created by subtracting the T=Nth
   oldest sample while adding the newest T=0 sample.  Essentialy at T=N
   each frequency bin has rotated N times and is back to 0 radians of
   phase and can be properly subtracted.

CONCLUSION
----------
SFFT.ASM is written for the DSP beginner, but is packed with features that
also make it useful to the experienced DSP veteran.  SFFT.ASM implements a
continuous time Fourier transform which can be used to construct filters,
analyze spectra, or used as a general-purpose DSP teaching platform.