Forum Discussion

AlexBeasley's avatar
AlexBeasley
Icon for Occasional Contributor rankOccasional Contributor
3 years ago

Intel Floating Point FFT IP Gives Different Result to Python Example

Hi there,

I am having some issues with a simulation of the Intel FFT IP core.

I have set up the IP core with the following parameters:


And I trying a simulation with a very simple input stream of floating point numbers. Each packet is 16 data points long (there are actually 16 packets, but to simplify we will just discuss the first packet here)

The fft_points_in and fft_points_out ports are both set with 12'd16
The inverse port is set to 1'd1 (we are actually doing an ifft)


The contents of the first packet are as follows :

[2.476367e+00 -2.004069e+00j, -6.033607e+00-2.375978e+00j, 4.926060e+00+ 1.137245e+00j,-2.703433e+00+4.676073e-01j, -1.587575e+00-7.515646e-01j, 1.801851e+00-1.081441e+00j, -4.354355e+00-4.494305e-01j, 6.268810e+00+1.988309e+00j, -6.082956e+00-3.423565e+00j,	-1.739878e+00+3.414765e+00j,6.424708e+00-7.262015e+00j,	-1.318541e+00+4.476929e+00j, -1.925149e+00+1.762921e+00j, 4.511015e+00-3.556520e-01j, -4.396655e+00+1.443588e+00j, -3.116817e+00+2.427287e+00j]


And the output of the FFT module is:

[-6.850152e+00-5.850621e-01j ,
-8.229693e+00-6.425217e+00j ,
3.095238e+01-1.321568e+01j  ,
-5.182938e+00-4.233325e+00j ,
6.954326e+00+7.075030e+00j  ,
2.097159e+01+1.175418e+01j  ,
2.702558e+00-1.543010e+01j  ,
-6.319021e+00+1.134168e+01j ,
-1.252404e+00-1.055310e+01j ,
-1.200143e+01-1.787161e+01j ,
3.848109e+01+2.209911e+01j  ,
7.762767e+00-2.060383e+01j  ,
-4.190218e+00-7.037646e+00j ,
8.351004e-01-1.334330e+01j  ,
-7.578176e+00+1.071285e+01j ,
-1.743392e+01+1.425092e+01j ]

For comparison I also do the same operation in python:

import numpy as np 

data =  [2.476367e+00 -2.004069e+00j, -6.033607e+00-2.375978e+00j, 4.926060e+00+ 1.137245e+00j,-2.703433e+00+4.676073e-01j, -1.587575e+00-7.515646e-01j, 1.801851e+00-1.081441e+00j, -4.354355e+00-4.494305e-01j, 6.268810e+00+1.988309e+00j, -6.082956e+00-3.423565e+00j,	-1.739878e+00+3.414765e+00j,6.424708e+00-7.262015e+00j,	-1.318541e+00+4.476929e+00j, -1.925149e+00+1.762921e+00j, 4.511015e+00-3.556520e-01j, -4.396655e+00+1.443588e+00j, -3.116817e+00+2.427287e+00j]	

print(np.fft.ifft(data))


Which gives:

array([-4.28134688e-01-3.65664875e-02j,  6.85571121e-04-1.85955469e-03j,
        7.83755566e-04+3.85427278e-04j,  1.18608141e-03+1.93582093e-03j,
        2.46045625e-03+7.73105625e-03j,  1.69786405e+00+4.96621572e-02j,
       -2.78107982e-01-2.68442361e+00j,  1.77786332e+00+8.59785352e-01j,
       -1.36809688e-01-1.15679477e+00j,  6.72262721e-01+1.00248660e+00j,
        8.77349057e-01+1.70746302e+00j,  3.15544506e-01-9.05137254e-01j,
       -1.21734433e+00+8.15608063e-02j,  3.97639803e-01-6.11021704e-01j,
       -6.23491081e-01-6.33172444e-01j, -5.83384553e-01+3.13896581e-01j])


The two arrays are wildly different and I don't understand where the difference is coming from. If anyone can help that would be amazing!

I know the Intel FFT IP core gives the output as "digit reversed". As far as I can tell this means the ordering of the outputs will be different, but I am not sure how exactly this works. Either way, as can be seen from the example above the two outputs contain different data, not just the same data in a different order.

Many thanks

7 Replies

  • AlexBeasley's avatar
    AlexBeasley
    Icon for Occasional Contributor rankOccasional Contributor

    To expand upon this issue, I have made a simpler test case which I will now describe. I am still trying to calculate an IFFT.

    The input to the FFT IP core is the following array:

    [(1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j)]

    A really simple sequence of 16 "ones".

    The expected IFFT is:

    [1.+1.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j,
     0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j]

    Very simply we have data in bin 0 (ones) and nothing in any other bin (the rest is zeros)

    I have then taken two copies of the intel FFT core as configured in the post above; and I have set the "inverse" port of one of the instances to 0 and in the other instance it is set to 1. So I am expecting that one of the instances will calculate the FFT and one will calculate the IFFT.

    For reference the FFT of this input array is:

    [16.+16.j,  0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,
     0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,
     0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j]

    When I then run a simulation of the two cores I get the following outputs

    Instance 1 - "inverse" driven to 0

    (16.000000 16.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)

    Instance 2 - "inverse" driven to 1

    (16.000000 16.000000) 
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    (0.000000 0.000000)
    

    (Please note that if we express these values in exponent form "%e" we actually see that some of the "zero" values are actually very very small decimal values that are just greater than 0, but I can accept some variation from the expected values).

    However the Big Problem here is that no matter whether I drive the "inverse" port to 1 or 0, I am only getting the forward FFT and it never calculates an IFFT for me.

    For clarity, the "inverse" port value is hard coded and a reset event happens at the beginning of the simulation for 5 clock cycles before data is sent to the core.

    Does anyone have any ideas on what I might be doing wrong to configure the core to give me IFFTs?

    Thanks!

  • AlexBeasley's avatar
    AlexBeasley
    Icon for Occasional Contributor rankOccasional Contributor

    I have experimented more with this issue and found the intel design example located here:

    https://www.intel.com/content/www/us/en/design-example/714680/cyclone-10-gx-fft-to-ifft-with-natural-input-and-output-order-using-cosine-data-example-design-17-1.html?wapkw=fft%20ifft

    I extracted the project archive, compiled it, generated the simulation IP setup scripts (Tools > Generate Simulator Setup Script for IP) and top level verilog representation of the block diagram design file (EDA Netlist Writer).

    I then ran this through the simulation and achieved the following result:


    The description of the example project claims that "When both the FFT and iFFT are operating as expected, Cosine data will be recovered and observed at the iFFT output."
    The output seen in the simulator is not the cosine wave as expected.

    I have attempted this in both:
    Quartus 17.1 Prime Pro* + ModelSim - Intel FPGA Starter Edition 10.5c

    Quartus 21.4 Prime Pro+ QuestaSim - Intel FPGA Edition 2021.3

    Both attempts yield an unexpected output - source_real and source_imag are not cosine waves.

    (below is the output from the QuestaSim simulation)


    *Questa 17.1 Prime Pro fails a full compile - the log can be seen below. I have checked and the file it claims to be unable to load does exist in the directory listed.

    Problem Details
    Error:
    Internal Error: Sub-system: DCALC, File: /quartus/ddb/dcalc/dcalc_bcm_modules_cache.cpp, Line: 116
    Could not load pdb file - c:/intelfpga_pro/17.1/quartus/common/devinfo/cyclone10gx/ddb_cyclone10gx_io_48_3v_tile-ff-3-0-hs_model_debug
    Stack Trace:
        0x5d410: DCALC_TIMING_MODULES_CACHE::get_model + 0x21dbc (ddb_dcalc)
        0x2f2cb: DCALC_TIMING_NETLIST_MANAGER_IMPL::load_model + 0x43 (ddb_dcalc)
        0x41631: <lambda_27927aa62a013f38f2f5db62a47234ba>::operator() + 0x71 (ddb_dcalc)
        0x41446: tbb::interface6::internal::partition_type_base<tbb::interface6::internal::auto_partition_type>::execute<tbb::interface6::internal::start_for<tbb::blocked_range<int>,tbb::internal::parallel_for_body<<lambda_27927aa62a013f38f2f5db62a47234ba>,int>,tbb::auto_partitioner const >,tbb::blocked_range<int> > + 0x6e (ddb_dcalc)
        0x413d0: tbb::interface6::internal::start_for<tbb::blocked_range<int>,tbb::internal::parallel_for_body<<lambda_27927aa62a013f38f2f5db62a47234ba>,int>,tbb::auto_partitioner const >::execute + 0x20 (ddb_dcalc)
        0x1c1f3: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all + 0x193 (tbb) at d:\sj\nightly\17.1\240\w64\acds\quartus\extlibs64\tbb\tbb42_20131118oss_altera\src\tbb\custom_scheduler.h:472
        0x19afe: tbb::internal::arena::process + 0x18e (tbb) at d:\sj\nightly\17.1\240\w64\acds\quartus\extlibs64\tbb\tbb42_20131118oss_altera\src\tbb\arena.cpp:105
        0x16867: tbb::internal::market::process + 0xf7 (tbb) at d:\sj\nightly\17.1\240\w64\acds\quartus\extlibs64\tbb\tbb42_20131118oss_altera\src\tbb\market.cpp:479
        0x10eac: tbb::internal::rml::private_worker::run + 0x6c (tbb) at d:\sj\nightly\17.1\240\w64\acds\quartus\extlibs64\tbb\tbb42_20131118oss_altera\src\tbb\private_server.cpp:283
        0x1111a: tbb::internal::rml::private_worker::thread_routine + 0x5a (tbb) at d:\sj\nightly\17.1\240\w64\acds\quartus\extlibs64\tbb\tbb42_20131118oss_altera\src\tbb\private_server.cpp:240
        0x24f7e: _beginthreadex + 0x106 (MSVCR120)
        0x25125: _endthreadex + 0x191 (MSVCR120)
        0x154df: BaseThreadInitThunk + 0xf (KERNEL32)
         0x485a: RtlUserThreadStart + 0x2a (ntdll)
    
    End-trace
    
    
    Executable: quartus_fit
    Comment:
    None
    
    System Information
    Platform: windows64
    OS name: Windows 10
    OS version: 10.0
    
    Quartus Prime Information
    Address bits: 64
    Version: 17.1.0
    Build: 240
    Edition: Pro Edition
    
  • AlexBeasley's avatar
    AlexBeasley
    Icon for Occasional Contributor rankOccasional Contributor

    Hi there,

    I have tried many sizes. From my design examples I have tried anywhere between 16 and 2048 points.

    The above design using the design example from Intel I have not edited the code and have just checked the FFT points are set to 128.

    Thanks

  • Kshitij_Intel's avatar
    Kshitij_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi,


    I have checked your python output is (FFT(fft_input)/16), output is in reverse order.


    To debug your Intel FPGA IP. Please share your simple project.


    Thank you

    Kshitij Goel


  • Kshitij_Intel's avatar
    Kshitij_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi,


    As we do not receive any response from you on the previous reply that we have provided. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.


    Thank you

    Kshitij Goel