Hey Guys, Im an EE major just starting on my BSEE senior project. It is to create an FPGA based acoustic echo canceller. We have to use a 1024 tap adaptive FIR filter. It is required that the 1024 tap fir filter be split into 8 smaller filters. The filtering will be done in the time domain using buffers and MACs. How could this be achieved? Can you directly add the outputs of multiple filters in parallel?

--- Quote Start --- Im an EE major just starting on my BSEE senior project. It is to create an FPGA based acoustic echo canceller. We have to use a 1024 tap adaptive FIR filter. It is required that the 1024 tap fir filter be split into 8 smaller filters. The filtering will be done in the time domain using buffers and MACs. How could this be achieved? Can you directly add the outputs of multiple filters in parallel? --- Quote End --- What is the sample rate of the incoming acoustic signal? 100kHz? FPGAs can be clocked at 1000x this rate, so you essentially have 1000 FPGA clocks per incoming acoustic sample. Inside the FPGA, you can implement a ~1000 tap filter using a single MAC and two RAM blocks; one with samples and one with coefficients. The RAM blocks can be dual-ported and double-buffered, so that you can write new coefficients and data to one half of the RAM using one port, while the digital filter logic is reading from the other half of RAM using the other port. There's really no reason to decide that you have to split the design into 8 separate filters ... but if that is a 'requirement' then perhaps you'll have to do it. It is however possible that your instructions were based on a multi-rate implementation of the same filter. So the first thing I would suggest is getting a clarification on why you have to implement the 8 filter architecture. Regarding your last question; yes you can add lots of filter outputs in parallel, but you generally do not do this in one FPGA clock cycle, you add pairs of samples in an adder tree (the tree depth is log2 of the number of inputs to add). For your 8 inputs, you have 4 x 2-input adders with 4 outputs (that are 1-bit wider), feeding 2 x 2-input adders with 2 outputs (that are 1-bit wider), feeding a final 2-input adder stage (log2(8) = 3 stages of adders). The final sum is 3-bits wider than the input sample width. You need to be careful about rounding and truncation operations. Take a read of these notes: http://www.ovro.caltech.edu/~dwh/correlator/pdf/esc-100paper_hawkins.pdf (http://www.ovro.caltech.edu/%7edwh/correlator/pdf/esc-100paper_hawkins.pdf) http://www.ovro.caltech.edu/~dwh/correlator/pdf/esc-100slides_hawkins.pdf (http://www.ovro.caltech.edu/%7edwh/correlator/pdf/esc-100slides_hawkins.pdf) Cheers, Dave

Thanks for responding to the question and for the pdfs. I didn't mention that the echo cancellation algorithm is Block NLMS. That is why he wants us to split the filter into blocks of 128. So, we would take one block of input data, send it to a filter, take another block, send it to another filter and so on. Each block would have it's own unique coefficients. The thing i'm confused by is how to split up a fir filter and combine the outputs to obtain the equivalent filter.

I am not clear about the over all purpose of your filter, but if your main filter is meant to be 1024 taps and to be applied to the input stream continuously but in blocks of 128 stages x 8 then you will need to add up all the 8 outputs continuously and divide result by 8 for every single output sample.

The purpose of the project is to cancel echo. Think about two conferencing rooms. You are in one room, and you talk. That signal would be xn. That signal reaches the second room, comes out of the speaker, bounces off of the walls, and returns to the mic. This returned signal is dn. The signal dn coming from the other room not only contains the other person's voice, but also some returned echo. The goal is to create an adaptive FIR filter that takes your voice, xn, as an input, and produces an output yn. This output is subtracted from the returned signal, dn, to yield en, the error signal. This error, along with the input xn and the step size, is used to calculate a gradient. This gradient is added to the current coefficient, yielding new coefficients that point in the direction of minimal error. After this happens a few times, the filter will be adjusted, and the filter output when subtracted from the returned signal will completely cancel the echo. This could be done easily on a DSP processor. However, we are to implement it in an FPGA to take advantage of parallelism. A CPU will grab xn and dn from mic inputs on the FPGA board, and send it to the logic. The logic will then calculate en, and each en value will be sent to the output port via the CPU. Due to the length of the return path, it is required that a 1024 tap filter be used. But, the professor insisted that we break up the filter into 8 blocks, and calculate new coefficients for each block seperately. My initial idea was to use muxes, and connect the 8 FIR filters in parallel via the muxes. The ouputs would be summed together to produce the final value. Each time you would send 128 values to one filter, and then control the muxes to move to the next, and so forth. But, this idea seems so clunky. It seems that there could be a better way to achieve this. I tried reading the IEEE papers, but they really aren't that helpful, and are way to complex for this project. I understand Verilog really well. I'm just stuck at the architectural implementation.

I know you can divide a signal into blocks in the frequency domain then process each block then add up the signal. But in your case, the signal xn is in time domain. The concept of of 8 blocks either means 8 independent filters or it means one filter of 1024 taps broken for implementation sake. In case you have to pass one block of data into one delay pipe and not pass it through to other blocks in succession then your filter becomes actually 8 independant filters for same input but segragated sections. May be your professor wants then somehow to choose among them. Addition here breaks the meaning of filtering but can still be viewd as some sort of averaging if you add the 8 results in parallel. So overall I am not clear really just like you. Possibly modelling will help

DSP Senior Project Question | Altera Community

11 Replies

Altera_Forum
Honored Contributor
14 years ago
--- Quote Start ---

DWH, are you saying that it is a waste of time to split the filter up? I agree. I don't understand why we are to split it up if we are working in an FPGA. Do you think there would be any performance decrease if we did just brute force it and implement the whole filter?
--- Quote End ---
I'm not saying its a waste of time, I'm just questioning whether it is the correct solution.

Take for example the following; lets say I have a signal that I sample at 1MHz and need to filter with a 100-tap filter and decimate the signal by 8 times. If you look in a multirate signal processing book, you will get lots of nice math that show how to take your filter taps, split them into 8 parallel filters and run them at 1/8th the input sample rate. Its very cool, but its a complete waste of time for this specific implementation (and possibly resources). The filter can be implemented using one MAC and a RAM and just run it at 100MHz (or whatever higher rate is appropriate for the filter and FPGA). Or you can get slightly trickier and implement the 8 parallel filters using that same MAC and RAM. The only difference then might be that you can run the logic slower - but if you were going to run the logic at 100MHz for your other logic, what difference does this 'savings' make?

'Optimal' is in the eye of the beholder.

Cheers,
Dave

Forum Discussion

DSP Senior Project Question

11 Replies

Recent Discussions

Request for REACH 253 Substance Certificate

Will serialization factor of 6 in LVDS serdes IP be supported in the future on Agilex5?

Cyclone 10 LP's Extended Industrial parts

Avalon-ST configuration with Agilex 3 fails

Agilex5 A5EB013BB23BE4S BSDL