I have implemented a divider in a custom VHDL module in order to replace the instances of the LPM_DIVIDE IPs in my project. I've then checked the functionality of my module against the LPM_DIVIDE and verified through a simulation that, with the exception of a different initial latency, my divider behaves exactly the same way of the LPM_DIVIDE IP, covering all the dynamic of the input operands. The problem is that when I test it on the hardware, I have some slightly different results. Debugging the problem with SignalTap, I verified that the problem was due to a different behavior of the LPM_DIVIDE IP on the hardware with respect to the simulation. I verified that the different behavior occurs when the result of the division has a negative sign. I have attached the results of the simulation and of the acquisition on SignalTap in case of a division between a negative numerator and a positive denominator. The correct result should be the one of the simulation (0xFF..FFF4FF), but for a reason that I still don't understand, in the hardware implementation the result is 0xFF..FFF4FE. Do you have any suggestion? Is there some kind of rounding in the implementation that is not present in the simulation model?

Hi,If you check the simulation, the 0xFF..FFF4FF is under 010139F1A9h.However in signal tap, you're checking under 010131CEB9h.May be use other trigger condition to trigger the 0xFF..FFF4FF under 010139F1A9h, may be use transitional.Thanks,Regards,Sheng

Hi Sheng, thank you for the answer. The 0xFF..FFF4FF (quotient) output is under 010139F1A9h (denominator) and 7FFFFA7872h (numerator) in the simulation because I've simulated the divider standalone in order to check its behavior feeding the IP only with those operands. In HW the divider is part of a more complex design and it is fed in pipeline with a stream of operands. That's the reason why in SignalTap you also see the subsequent operands, but the 0xFF..FFF4FE is the result of the first two operands of the stream, that are the same of the simulation:

Hi,Pipeline will add latency. Could you simulate the whole complex design with pipeline instead of just divider standalone? What will be seen?Thanks,Regards,Sheng

Hi Sheng.The number of pipeline stages is a parameter of the divider and I use the same value both in the simulation and in the implementation. There are no differences. The entire complex design is impossibile to simulate due to its size. What can I do is to simulate and implement a simple one only with the divider and an operands generator inside, monitoring the outputs both in simulation and in signaltap. I think that I'll see the same behavior. Il post the results here in the next days.

Hi,Understood that. Please let me know whether simple one only with the divider and an operands generator inside got same simulation and signal tap?Thanks,Regards,Sheng

LPM DIVIDE behaves differently between simulation and implementation

21 Replies

ShengN_altera
Super Contributor
1 year ago
Hi,

If you check the simulation, the 0xFF..FFF4FF is under 010139F1A9h.
However in signal tap, you're checking under 010131CEB9h.
May be use other trigger condition to trigger the 0xFF..FFF4FF under 010139F1A9h, may be use transitional.

Thanks,
Regards,
Sheng
marcorig
New Contributor
1 year ago
Hi Sheng, thank you for the answer. The 0xFF..FFF4FF (quotient) output is under 010139F1A9h (denominator) and 7FFFFA7872h (numerator) in the simulation because I've simulated the divider standalone in order to check its behavior feeding the IP only with those operands. In HW the divider is part of a more complex design and it is fed in pipeline with a stream of operands. That's the reason why in SignalTap you also see the subsequent operands, but the 0xFF..FFF4FE is the result of the first two operands of the stream, that are the same of the simulation:
ShengN_altera
Super Contributor
1 year ago
Hi,

Pipeline will add latency. Could you simulate the whole complex design with pipeline instead of just divider standalone? What will be seen?

Thanks,
Regards,
Sheng
marcorig
New Contributor
11 months ago
Hi Sheng.
The number of pipeline stages is a parameter of the divider and I use the same value both in the simulation and in the implementation. There are no differences. The entire complex design is impossibile to simulate due to its size. What can I do is to simulate and implement a simple one only with the divider and an operands generator inside, monitoring the outputs both in simulation and in signaltap. I think that I'll see the same behavior. Il post the results here in the next days.
ShengN_altera
Super Contributor
11 months ago
Hi,

Understood that. Please let me know whether simple one only with the divider and an operands generator inside got same simulation and signal tap?

Thanks,
Regards,
Sheng
- marcorig
  New Contributor
  11 months ago
  Hi Sheng,
  I've implemented this simple selfchecking module, which instantiates only the LPM_DIVIDE, stimulated with 1000 couple of operands. I've simulated the module, synthesized and checked its behavior with SignalTap. As I though, the strange behaviour previously observed is present also in this version:
  Considering the first operation:
  Operands:
  Numerator = FFFFFFEC82000000h = -83718307840d
  Denominator = 000000000Bh = 11d.
  The result is -83718307840/11 = -7610755258.181818 or -7610755258 with a remainder of -2, that are the outputs provided by the module during the simulation: Quotient: FFFFFFFE3A5D1746h and Remainder: 7FFFFFFFFEh
  On the other side, the implemented divider publishes these results:
  Quotient: FFFFFFFE3A5D1745h and remainder of 0000000009h, which are different from the one seen during simulation.
  I've tested the module using Quartus Prime Version 20.1.1 Build 720 11/11/2020 Patches 1.02std SJ Standard Edition implementing it on a Arria 10 10AX066K4F35I3SG.
  lpm_divide_self_checking.vhd5 KB
marcorig
New Contributor
11 months ago
Hi Sheng, I've implemented a simple module which instantiates the LPM_DIVIDER only and a process that feeds the divider with 1000 couples of operands. As I expected, I've seen the same strange behavior of the implemented module which is different from the simulation when the quotient is negative. The following figure show this difference:
I've taken into consideration only the first division of the sequence, since the problem is the same also in the subsequent divisions. Operands:
Numerator = FFFFFFEC82000000h = -83718307840d; Denominator = 000000000Bh = 11d.
Quotient (SignalTap) = FFFFFFFE3A5D1745h = -7610755259d; Remainder (SignalTap) = 0000000009h = 9d.
Quotient (Simulation) = FFFFFFFE3A5D1746h = -7610755258d; Remainder (Simulation) = 7FFFFFFFFEh = -2d.
If can be of some help, I've implemented the module on a ARRIA10 FPGA (10AX066K4F35I3SG) using the following version of Quartus Prime: Quartus Prime Version 20.1.1 Build 720 11/11/2020 Patches 1.02std SJ Standard Edition.
Thank you,
Marco
lpm_divide_self_operands_gen.vhd5 KB
ShengN_altera
Super Contributor
11 months ago
Hi,

Could you provide the project file for taking a look?

Thanks,
Regards,
Sheng
marcorig
New Contributor
11 months ago
Hi Sheng,
since the ARRIA10 board is a custom board, I've done another project using the Intel MAX10 FPGA Development kit and simply instantiated the lpm_divide_self_operands_gen.vhd in the top module. I've checked again the outputs of the divider with signal tap, confirming that also with that device I see the same behavior. I attached the whole project (I deleted the ./db subfolder otherwise the compressed file would have been more than 23MB allowed for the attachments). There is also the signal tap inside the project, which shows the following, triggering with operands_counter = 1:
Considering this, I don't think it's an issue of the device.
LPMDIV.7z4.5 MB
ShengN_altera
Super Contributor
11 months ago
Hi,

Possible provide the testbench as well for simulation?

Thanks,
Regards,
Sheng
marcorig
New Contributor
11 months ago
Yes, of course. Here it is.
lpm_divide_self_operands_gen_tb.vhd2 KB