ScopeFIR Free Trial

Iowegian's
dspGuru
 FIR FAQ
 Part 4: Implementation 
Home   Up   First   Previous Help

4.1 What is the basic algorithm for implementing FIR filters?

Structurally, FIR filters consist of just two things: a sample delay line and a set of coefficients. To implement the filter:

  1. Put the input sample into the delay line.
  2. Multiply each sample in the delay line by the corresponding coefficient and accumulate the result.
  3. Shift the delay line by one sample to make room for the next input sample.

4.2 How do I implement FIR filters in C?

There are lots of possibilities, including a few tricks. To illustrate, we have provided a set of FIR filter algorithms implemented in C called "FirAlgs.c" in ScopeFIR's distribution file. FirAlgs.c includes the following functions:

  1. fir_basic: Illustrates the basic FIR calculation described above by implementing it very literally.
  2. fir_circular: Illustrates how circular buffers are used to implement FIRs.
  3. fir_shuffle: Illustrates the "shuffle down" technique used by some of Texas Instruments' processors.
  4. fir_split: Splits the FIR calculation into two flat (non-circular) pieces to avoid the use circular buffer logic and shuffling.
  5. fir_double_z: Uses a double-sized delay line so that the FIR calculation can be done using a flat buffer.
  6. fir_double_h: Similar to fir_double_z, this uses a double-sized coefficient so that the FIR calculation can be done using a flat buffer.

4.3 How do I implement FIR filters in assembly?

FIR assembly algorithms are quite processor-specific, but the most common system uses a circular buffer mechanism provided by the DSP processor. The basic steps are:

  1. Configure the circular buffer; load the coefficient and delay-line pointers. Then, for each input sample:
  2. Store the incoming data in the delay line; increment the delay-line pointer.
  3. Clear the multiplier-accumulator.
  4. Loop over all coefficients/delays; accumulate the values obtained by multiplying the coefficients by the delayed samples.
  5. Round or truncate the result as the FIR output.

Alternatively, a "shuffle down" method is used in Texas Instruments' older fixed-point processors to implement circular buffers. The processor literally moves each sample delay values by one slot during each multiply-accumulate (via the "MACD" instruction).

Each DSP microprocessor manufacturer provides example FIR assembly code in its data books or its application handbooks, so be sure to look at those before you "reinvent the circular buffer".

4.4 How do I test my FIR implementation?

Here are a few methods:

4.5 What tricks are useful in implementing FIR filters?

FIR tricks center on two things 1) not calculating things that don't need to be calculated, and 2) "faking" circular buffers in software.

4.5.1 How do I skip needless calculations?

First, if your filter has zero-valued coefficients, you don't actually have to calculate those taps; you can leave them out. A common case of this is "half-band" filter, which have the property that every-other coefficient is zero.

Second, if your filter is "symmetric" (linear phase), you can "pre-add" the samples which will be multiplied by the same coefficient value, prior to doing the multiply. Since this technique essentially trades an add for a multiply, it isn't really useful in DSP microprocessors which can do a multiply in a single instruction cycle. However, it is useful in ASIC implementations (in which addition is usually much less expensive than multiplication); also, some newer DSP processors now offer special hardware and instructions to make use of this trick.

4.5.2 How do I fake circular buffers in software?

When hardware support for circular buffers isn't available, you have to "fake" them. Also, since ANSI C has no construct to describe circular buffers, most C compilers can't generate code to use them, even if the target processor has them.

You can always implement a circular buffer by duplicating the logic of a circular buffer in software (and many have), but the overhead can be prohibitive; the circular-fake might take several instructions to implement, compared to just a single instruction to do the multiply-accumulate operation. Therefore you need to fake it.

Here are several basic techniques to fake circular buffers:

  1. Split the calculation: You can split any FIR calculation into its "pre-wrap" and "post-wrap" parts. By splitting the calculation into these two parts, you essentially can do the circular logic only once, rather than once per tap. (See fir_double_z in FirAlgs.c above.)
  2. Duplicate the delay line: For a FIR with N taps, use a delay line of size 2N. Copy each sample to its proper location, as well as at location-plus-N. Therefore, the FIR calculation's MAC loop can be done on a flat buffer of N points, starting anywhere within the first set of N points. The second set of N delayed samples provides the "wrap around" comparable to a true circular buffer. (See fir_double_z in FirAlgs.c above.)
  3. Duplicate the coefficients: This is similar to the above, except that the duplication occurs in terms of the coefficients, not the delay line. Compared to the previous method, this has a calculation advantage of not having to store each incoming sample twice, and it also has a memory advantage when the same coefficient set will be used on multiple delay lines. (See fir_double_h in FirAlgs.c above.)
  4. Use block processing: In block processing, you use a delay line which is a multiple of the number of taps. You therefore only have to move the data once per block to implement the delay-line mechanism. When the block size becomes "large", the overhead of a moving the delay line once per block becomes negligible.

FIR FAQ Part 1: Basics

Home Up First Previous © 1999-2007 Iowegian International Corp. Terms of Use and Legal Notices