Intel Floating Point FFT IP Gives Different Result to Python Example

Question

Hi there,

I am having some issues with a simulation of the Intel FFT IP core. I have set up the IP core with the following parameters:

And I trying a simulation with a very simple input stream of floating point numbers. Each packet is 16 data points long (there are actually 16 packets, but to simplify we will just discuss the first packet here) 
The fft_points_in and fft_points_out ports are both set with 12'd16The inverse port is set to 1'd1 (we are actually doing an ifft)
The contents of the first packet are as follows :

[2.476367e+00 -2.004069e+00j, -6.033607e+00-2.375978e+00j, 4.926060e+00+ 1.137245e+00j,-2.703433e+00+4.676073e-01j, -1.587575e+00-7.515646e-01j, 1.801851e+00-1.081441e+00j, -4.354355e+00-4.494305e-01j, 6.268810e+00+1.988309e+00j, -6.082956e+00-3.423565e+00j,	-1.739878e+00+3.414765e+00j,6.424708e+00-7.262015e+00j,	-1.318541e+00+4.476929e+00j, -1.925149e+00+1.762921e+00j, 4.511015e+00-3.556520e-01j, -4.396655e+00+1.443588e+00j, -3.116817e+00+2.427287e+00j]

And the output of the FFT module is:

[-6.850152e+00-5.850621e-01j ,
-8.229693e+00-6.425217e+00j ,
3.095238e+01-1.321568e+01j  ,
-5.182938e+00-4.233325e+00j ,
6.954326e+00+7.075030e+00j  ,
2.097159e+01+1.175418e+01j  ,
2.702558e+00-1.543010e+01j  ,
-6.319021e+00+1.134168e+01j ,
-1.252404e+00-1.055310e+01j ,
-1.200143e+01-1.787161e+01j ,
3.848109e+01+2.209911e+01j  ,
7.762767e+00-2.060383e+01j  ,
-4.190218e+00-7.037646e+00j ,
8.351004e-01-1.334330e+01j  ,
-7.578176e+00+1.071285e+01j ,
-1.743392e+01+1.425092e+01j ]

For comparison I also do the same operation in python:

import numpy as np

data =  [2.476367e+00 -2.004069e+00j, -6.033607e+00-2.375978e+00j, 4.926060e+00+ 1.137245e+00j,-2.703433e+00+4.676073e-01j, -1.587575e+00-7.515646e-01j, 1.801851e+00-1.081441e+00j, -4.354355e+00-4.494305e-01j, 6.268810e+00+1.988309e+00j, -6.082956e+00-3.423565e+00j,	-1.739878e+00+3.414765e+00j,6.424708e+00-7.262015e+00j,	-1.318541e+00+4.476929e+00j, -1.925149e+00+1.762921e+00j, 4.511015e+00-3.556520e-01j, -4.396655e+00+1.443588e+00j, -3.116817e+00+2.427287e+00j]

print(np.fft.ifft(data))

Which gives:

array([-4.28134688e-01-3.65664875e-02j,  6.85571121e-04-1.85955469e-03j,
        7.83755566e-04+3.85427278e-04j,  1.18608141e-03+1.93582093e-03j,
        2.46045625e-03+7.73105625e-03j,  1.69786405e+00+4.96621572e-02j,
       -2.78107982e-01-2.68442361e+00j,  1.77786332e+00+8.59785352e-01j,
       -1.36809688e-01-1.15679477e+00j,  6.72262721e-01+1.00248660e+00j,
        8.77349057e-01+1.70746302e+00j,  3.15544506e-01-9.05137254e-01j,
       -1.21734433e+00+8.15608063e-02j,  3.97639803e-01-6.11021704e-01j,
       -6.23491081e-01-6.33172444e-01j, -5.83384553e-01+3.13896581e-01j])

The two arrays are wildly different and I don't understand where the difference is coming from. If anyone can help that would be amazing!

I know the Intel FFT IP core gives the output as "digit reversed". As far as I can tell this means the ordering of the outputs will be different, but I am not sure how exactly this works. Either way, as can be seen from the example above the two outputs contain different data, not just the same data in a different order.

Many thanks

alexbeasley · Answer

To expand upon this issue, I have made a simpler test case which I will now describe. I am still trying to calculate an IFFT.The input to the FFT IP core is the following array: 
[(1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j)]
A really simple sequence of 16 "ones". The expected IFFT is: 
[1.+1.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j,
 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j]
Very simply we have data in bin 0 (ones) and nothing in any other bin (the rest is zeros) I have then taken two copies of the intel FFT core as configured in the post above; and I have set the "inverse" port of one of the instances to 0 and in the other instance it is set to 1.  So I am expecting that one of the instances will calculate the FFT and one will calculate the IFFT.

For reference the FFT of this input array is: 
[16.+16.j,  0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,
 0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,
 0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j]

When I then run a simulation of the two cores I get the following outputs Instance 1 - "inverse" driven to 0

(16.000000 16.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)

Instance 2 - "inverse" driven to 1

(16.000000 16.000000) 
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)

(Please note that if we express these values in exponent form "%e" we actually see that some of the "zero" values are actually very very small decimal values that are just greater than 0, but I can accept some variation from the expected values). 
However the Big Problem here is that no matter whether I drive the "inverse" port to 1 or 0, I am only getting the forward FFT and it never calculates an IFFT for me. 
For clarity, the "inverse" port value is hard coded and a reset event happens at the beginning of the simulation for 5 clock cycles before data is sent to the core. Does anyone have any ideas on what I might be doing wrong to configure the core to give me IFFTs?

Thanks!

alexbeasley · Answer

I have experimented more with this issue and found the intel design example located here: 
https://www.intel.com/content/www/us/en/design-example/714680/cyclone-10-gx-fft-to-ifft-with-natural-input-and-output-order-using-cosine-data-example-design-17-1.html?wapkw=fft%20ifft

I extracted the project archive, compiled it, generated the simulation IP setup scripts (Tools > Generate Simulator Setup Script for IP) and top level verilog representation of the block diagram design file (EDA Netlist Writer).

I then ran this through the simulation and achieved the following result:

The description of the example project claims that "When both the FFT and iFFT are operating as expected, Cosine data will be recovered and observed at the iFFT output." The output seen in the simulator is not the cosine wave as expected. I have attempted this in both: Quartus 17.1 Prime Pro* + ModelSim - Intel FPGA Starter Edition 10.5c 
Quartus 21.4 Prime Pro+ QuestaSim - Intel FPGA Edition 2021.3 Both attempts yield an unexpected output - source_real and source_imag are not cosine waves. 
(below is the output from the QuestaSim simulation)

*Questa 17.1 Prime Pro fails a full compile - the log can be seen below. I have checked and the file it claims to be unable to load does exist in the directory listed.

Problem Details
Error:
Internal Error: Sub-system: DCALC, File: /quartus/ddb/dcalc/dcalc_bcm_modules_cache.cpp, Line: 116
Could not load pdb file - c:/intelfpga_pro/17.1/quartus/common/devinfo/cyclone10gx/ddb_cyclone10gx_io_48_3v_tile-ff-3-0-hs_model_debug
Stack Trace:
    0x5d410: DCALC_TIMING_MODULES_CACHE::get_model + 0x21dbc (ddb_dcalc)
    0x2f2cb: DCALC_TIMING_NETLIST_MANAGER_IMPL::load_model + 0x43 (ddb_dcalc)
    0x41631: &lt;lambda_27927aa62a013f38f2f5db62a47234ba&gt;::operator() + 0x71 (ddb_dcalc)
    0x41446: tbb::interface6::internal::partition_type_base&lt;tbb::interface6::internal::auto_partition_type&gt;::execute&lt;tbb::interface6::internal::start_for&lt;tbb::blocked_range&lt;int&gt;,tbb::internal::parallel_for_body&lt;&lt;lambda_27927aa62a013f38f2f5db62a47234ba&gt;,int&gt;,tbb::auto_partitioner const &gt;,tbb::blocked_range&lt;int&gt; &gt; + 0x6e (ddb_dcalc)
    0x413d0: tbb::interface6::internal::start_for&lt;tbb::blocked_range&lt;int&gt;,tbb::internal::parallel_for_body&lt;&lt;lambda_27927aa62a013f38f2f5db62a47234ba&gt;,int&gt;,tbb::auto_partitioner const &gt;::execute + 0x20 (ddb_dcalc)
    0x1c1f3: tbb::internal::custom_scheduler&lt;tbb::internal::IntelSchedulerTraits&gt;::local_wait_for_all + 0x193 (tbb) at d:\sj
ightly\17.1\240\w64\acds\quartus\extlibs64	bb	bb42_20131118oss_altera\src	bb\custom_scheduler.h:472
    0x19afe: tbb::internal::arena::process + 0x18e (tbb) at d:\sj
ightly\17.1\240\w64\acds\quartus\extlibs64	bb	bb42_20131118oss_altera\src	bb\arena.cpp:105
    0x16867: tbb::internal::market::process + 0xf7 (tbb) at d:\sj
ightly\17.1\240\w64\acds\quartus\extlibs64	bb	bb42_20131118oss_altera\src	bb\market.cpp:479
    0x10eac: tbb::internal::rml::private_worker::run + 0x6c (tbb) at d:\sj
ightly\17.1\240\w64\acds\quartus\extlibs64	bb	bb42_20131118oss_altera\src	bb\private_server.cpp:283
    0x1111a: tbb::internal::rml::private_worker::thread_routine + 0x5a (tbb) at d:\sj
ightly\17.1\240\w64\acds\quartus\extlibs64	bb	bb42_20131118oss_altera\src	bb\private_server.cpp:240
    0x24f7e: _beginthreadex + 0x106 (MSVCR120)
    0x25125: _endthreadex + 0x191 (MSVCR120)
    0x154df: BaseThreadInitThunk + 0xf (KERNEL32)
     0x485a: RtlUserThreadStart + 0x2a (ntdll)

End-trace

Executable: quartus_fit
Comment:
None

System Information
Platform: windows64
OS name: Windows 10
OS version: 10.0

Quartus Prime Information
Address bits: 64
Version: 17.1.0
Build: 240
Edition: Pro Edition

kshitij_intel · Answer

Hi,What is your FFT_SIZE?Thank youKshitij Goel

alexbeasley · Answer

Hi there,

I have tried many sizes. From my design examples I have tried anywhere between 16 and 2048 points.

The above design using the design example from Intel I have not edited the code and have just checked the FFT points are set to 128.

Thanks

kshitij_intel · Answer

Hi,I have checked your python output is (FFT(fft_input)/16), output is in reverse order.To debug your Intel FPGA IP. Please share your simple project.Thank youKshitij Goel

Forum Discussion

Intel Floating Point FFT IP Gives Different Result to Python Example

7 Replies

Recent Discussions

Configurable transceiver enable

Where is High Speed Transceiver Demo Design in FPGA Wiki ?

CORDIC ATan2 Failed to Generate

LPDDR4 not available in NIOSV/g linker script - Agilex-5, Quartus 26.1 Pro

Interface LVDS to Gigabit transceivers