The example that you gave was to use a 8k FFT to compute a 32k FFT. Let N be your desired FFT size, in this case 32768. Also let M be your block FFT size, which in this case will be 8192.
Lets suppose that your 32768 data points are stored in a contiguous piece of memory. Lets refer to this set of data as x(n) where n ranges from 0 to 32767.
Next, we perform the M-point FFTs on the decimated representations of x(n). Since M = 8192 and N = 32768, we know that we will need to perform 4 block FFT operation (note that the M does not need to be an integer multiple of M...). Let's call this number L, where L = ceil[N/M].
Follow the following steps:
- Perform an M-point FFT on the x(n) samples where n = 0, 4, 8, ..., ( ((N-1) -3). We'll call those FFT results x0(k).
- Store four copies of x0(k) in an array/memory.
- Next we compute an M-point FFT on the x(n) samples where n = 1, 5, 9, ..., ((N-1) -2). We call those FFT results x1(k).
- Store four copies of x1(k) in an array/memory.
- Next we compute an M-point FFT on the x(n) samples where n = 2, 6, 10, ..., ((N-1) -1). We call those FFT results x2(k).
- Store four copies of x2(k) in an array/memory.
- Next we compute an M-point FFT on the x(n) samples where n = 3, 7, 11, ..., ((N-1) - 0). We call those FFT results x3(k).
- Store four copies of x3(k) in an array/memory.
At this point you are almost finished....almost :-P. You should have 4 arrays each of size N. Now the question is how do I put all of this data back together to get the 32768-point FFT that we desire? The answer is that each contribution must be "scaled" by a coefficient matrix. Since N = 32768, we know that our coefficient matrix should have 32768 value in it. The coefficient matrix, L, is defined below:
L(k) =
e-
j2πk/N; for k -> 0 : (N - 1)
This is nothing more than a complete cycle of the unit circle with 32768 samples.
You can find a much better description of this at the following website :
http://www.dsprelated.com/showarticle/63.php Let me know if you have any questions.