Forum Discussion

Broddo's avatar
Broddo
Icon for Occasional Contributor rankOccasional Contributor
6 months ago

NIOSV/g with FPU: inconsistent calculation results

I'm using a NIOSV/g with FPU enabled in a MAX10 project. The project involves heavy use of float point calculations, hence the need for the FPU. I noticed some occasionally inconsistent results in this program and started debugging - assuming this was a bug in my code. However I was able to run my code in a simulator and on a different RISCV microcontroller and everything worked flawlessly. I also disabled the FPU in the NIOSV design and again the code ran fine.

In order to recreate the problem, I created a basic project with just the NIOSV, some RAM and the JTAG-Uart. I also wrote a tiny C program to stress test the FPU. The results of this show that again, the FPU is producing incorrect results.

I've attached a screenshot of the Platform Designer design. I'm running the design at 75Mhz and the design meets timing requirements.

Here is the code I ran. Note that I have interrupts disabled to be sure this isn't a context switching issue. I also did not wrap the calculations into a function so I could more easily view the various calculation results in the debugger. This code works as expected when using a soft-FPU. When using the NIOSV FPU, results are inconsistent. I've attached a screenshot of one failed cycled. You can see that a1 and b1 are not equal.

#include <stdint.h>
#include <math.h>
#include "sys/alt_stdio.h"

static void fpuTest(void) {
  int fail_count = 0;
  int iteration = 0;

  while (1) {
    float a0 = (float)iteration * 0.001f;
    float a1 = 1.1f * sinf((float)iteration * 0.1f);
    float a2 = 2.2f / (1.0f + (float)iteration * 0.0001f);
    float a3 = sqrtf(3.3f + (float)iteration);
    float a4 = powf(4.4f + (float)iteration, 1.1f);
    float a5 = logf(5.5f + (float)iteration + 1.0f);
    float a6 = 6.6f * cosf((float)iteration * 0.05f);
    float a7 = 7.7f + tanf((float)iteration * 0.02f);

    float result_a = a0 + a1 + a2 + a3 + a4 + a5 + a6 + a7;

    float b0 = (float)iteration * 0.001f;
    float b1 = 1.1f * sinf((float)iteration * 0.1f);
    float b2 = 2.2f / (1.0f + (float)iteration * 0.0001f);
    float b3 = sqrtf(3.3f + (float)iteration);
    float b4 = powf(4.4f + (float)iteration, 1.1f);
    float b5 = logf(5.5f + (float)iteration + 1.0f);
    float b6 = 6.6f * cosf((float)iteration * 0.05f);
    float b7 = 7.7f + tanf((float)iteration * 0.02f);

    float result_b = b0 + b1 + b2 + b3 + b4 + b5 + b6 + b7;

    // Check if result is consistent (should be identical)
    if (fabsf(result_a - result_b) > 1e-6f) {
      alt_printf("FPU test failed at iteration %x\n", iteration);
      fail_count++;
    }

    iteration++;
  }
}

int main(void) {
  // Make sure interrupts are disabled
  __asm volatile ( "csrc mstatus, 8" );

  fpuTest();

  while (1);

  return 0;
}

Can someone help me investigate what could be wrong here? Could there be an issue in the FPU itself?

11 Replies

  • Hi @Broddo

    Our patch is through internal testing which includes a check against your software and NiosV parameterization.

    It should become available on our website in roughly a week's time - we will update the thread when it is available.

    Thanks

    Mark

  • LYGOOI's avatar
    LYGOOI
    Icon for New Contributor rankNew Contributor

    Hi @Broddo

    Can you share your processor IP Parameter Editor settings?

    And which Quartus version is your design based on?


    Or, you can zip the design & attach it in your next reply.

    Regards,

    Liang Yu

    • Broddo's avatar
      Broddo
      Icon for Occasional Contributor rankOccasional Contributor

      Thanks for the reply @LYGOOI and apologies for the delay in getting back to you - I was on vacation.

      To answer your questions: I'm using Quartus Lite 24.1 and I've pasted the CPU parameters below.

      I've attached the test project that builds all of this. I'm running on a custom board (that was previously running a NIOS2 application with no issues). If you want to run it yourself, the only change you'll have to make is the location of the source clock.

      For convenience, I've added a Makefile that will build the project and the software - you'll see for yourself.

      Here are the CPU parameters

      <module
         name="intel_niosv_g_0"
         kind="intel_niosv_g"
         version="4.0.0"
         enabled="1">
        <parameter name="AUTO_CLK_CLOCK_DOMAIN" value="3" />
        <parameter name="AUTO_CLK_RESET_DOMAIN" value="3" />
        <parameter name="AUTO_DEVICE" value="10M50DAF484C8G" />
        <parameter name="AUTO_DEVICE_SPEEDGRADE" value="8" />
        <parameter name="Blind_Window_Period" value="1000" />
        <parameter name="CLICenabledInterruptMode" value="0" />
        <parameter name="CLICenabledShadowRegisterFiles" value="1" />
        <parameter name="CUSTOM_OP" value="" />
        <parameter name="Default_Timeout_Period" value="255" />
        <parameter name="SUB_OP" value="" />
        <parameter name="alignCLICVectorTable" value="8" />
        <parameter name="basicInterruptMode" value="0" />
        <parameter name="basicShadowRegisterFiles" value="0" />
        <parameter name="clockFrequency" value="75000000" />
        <parameter name="dataCacheSize" value="4096" />
        <parameter name="dataSlaveMapParam"><![CDATA[<address-map><slave name='onchip_flash.data' start='0x0' end='0x160000' type='altera_onchip_flash.data' /><slave name='onchip_memory.s1' start='0x200000' end='0x214000' type='altera_avalon_onchip_memory2.s1' /><slave name='intel_niosv_g_0.dm_agent' start='0x220000' end='0x230000' type='intel_niosv_g.dm_agent' /><slave name='intel_niosv_g_0.timer_sw_agent' start='0x230000' end='0x230040' type='intel_niosv_g.timer_sw_agent' /><slave name='jtag_uart_0.avalon_jtag_slave' start='0x230040' end='0x230048' type='altera_avalon_jtag_uart.avalon_jtag_slave' /><slave name='onchip_flash.csr' start='0x230048' end='0x230050' type='altera_onchip_flash.csr' /></address-map>]]></parameter>
        <parameter name="deviceFamily" value="MAX 10" />
        <parameter name="disableFsqrtFdiv" value="false" />
        <parameter name="dtcm1Base" value="18874368" />
        <parameter name="dtcm1InitFile" value="" />
        <parameter name="dtcm1Size" value="0" />
        <parameter name="dtcm2Base" value="0" />
        <parameter name="dtcm2InitFile" value="" />
        <parameter name="dtcm2Size" value="0" />
        <parameter name="enableBranchPrediction" value="true" />
        <parameter name="enableCLICInterruptEdgeTriggerConfig" value="false" />
        <parameter name="enableCLICInterruptPolarityConfig" value="false" />
        <parameter name="enableCLICSelectiveHardwareVectoring" value="false" />
        <parameter name="enableCoreLevelInterruptController" value="false" />
        <parameter name="enableDebug" value="true" />
        <parameter name="enableDebugReset" value="true" />
        <parameter name="enableECCFull" value="false" />
        <parameter name="enableECCLite" value="false" />
        <parameter name="enableFPU" value="true" />
        <parameter name="enableLockstep" value="false" />
        <parameter name="enableLockstepExtRst" value="false" />
        <parameter name="enableMulDiv" value="true" />
        <parameter name="funct3" value="" />
        <parameter name="funct7_l" value="" />
        <parameter name="funct7_u" value="" />
        <parameter name="hartId" value="0" />
        <parameter name="instCacheSize" value="4096" />
        <parameter name="instSlaveMapParam"><![CDATA[<address-map><slave name='onchip_flash.data' start='0x0' end='0x160000' type='altera_onchip_flash.data' /><slave name='onchip_memory.s1' start='0x200000' end='0x214000' type='altera_avalon_onchip_memory2.s1' /><slave name='intel_niosv_g_0.dm_agent' start='0x220000' end='0x230000' type='intel_niosv_g.dm_agent' /></address-map>]]></parameter>
        <parameter name="itcm1Base" value="19922944" />
        <parameter name="itcm1InitFile" value="" />
        <parameter name="itcm1Size" value="0" />
        <parameter name="itcm2Base" value="0" />
        <parameter name="itcm2InitFile" value="" />
        <parameter name="itcm2Size" value="0" />
        <parameter name="mnemonic" value="" />
        <parameter name="numCLICDebugTriggers" value="0" />
        <parameter name="numCLICLevels" value="2" />
        <parameter name="numCLICPlatformInterrupts" value="16" />
        <parameter name="numCLICPriorities" value="8" />
        <parameter name="opcode" value="" />
        <parameter name="peripheralRegionABase" value="2293760" />
        <parameter name="peripheralRegionASize" value="65536" />
        <parameter name="peripheralRegionBBase" value="67108864" />
        <parameter name="peripheralRegionBSize" value="2097152" />
        <parameter name="resetOffset" value="0" />
        <parameter name="resetSlave" value="onchip_flash.data" />
        <parameter name="useResetReq" value="false" />
       </module>
  • LYGOOI's avatar
    LYGOOI
    Icon for New Contributor rankNew Contributor

    Hi @Broddo,

    Thanks for the design.
    We are able to replicate the same issue. Found the cause, and currently investigating deeper.

    As a temporary workaround, please perform the following steps:

    1. Disable cache in the Platform Designer
      (Select No Cache for both Instruction & Data cache)
    2. Enable C++ in BSP Editor
      (Checked the enable_c_plus_plus checkbox)

    Regards,
    Liang Yu

  • Hi @Broddo,


    Good day, just following up on the previous clarification.

    By any chances did you managed try out the workaround?

    Hope to hear from you soon.


    Best Wishes

    BB


  • Broddo's avatar
    Broddo
    Icon for Occasional Contributor rankOccasional Contributor

    @BoonBengT_Altera @LYGOOI

    My apologies again for the delay in responding. Yes I can confirm that the work-around suggested by @LYGOOI does address the problem - the FPU is producing consistently correct results now, so thanks for this!

    However, disabling caching is a big trade off for my project as I'm executing in place from external flash. I'll need to profile this to determine if FPU without cache is more performant than soft-float with cache. For now, I'm working with the latter and will wait for the fix in a coming release.

    Thanks once again,

    Broddo

  • Hi @Broddo,


    Great! Thanks for confirming that it is working for the workaround and filling us in with your actions, with no further clarification on this thread, it will be transitioned to community support for further help on doubts in this thread.


    Please login to ‘ https://supporttickets.intel.com/s/?language=en_US’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.

    Thank you for the questions and as always pleasure having you here.


    Best Wishes

    BB


  • Hi @Broddo

    Just to let you know, we have found and fixed the issue. A patch is being prepared and is currently in internal testing.

    Thank you very much for the example code, that helped us find the root cause very quickly.

    Mark