LPM DIVIDE behaves differently between simulation and implementation
I have implemented a divider in a custom VHDL module in order to replace the instances of the LPM_DIVIDE IPs in my project. I've then checked the functionality of my module against the LPM_DIVIDE and verified through a simulation that, with the exception of a different initial latency, my divider behaves exactly the same way of the LPM_DIVIDE IP, covering all the dynamic of the input operands. The problem is that when I test it on the hardware, I have some slightly different results. Debugging the problem with SignalTap, I verified that the problem was due to a different behavior of the LPM_DIVIDE IP on the hardware with respect to the simulation. I verified that the different behavior occurs when the result of the division has a negative sign. I have attached the results of the simulation and of the acquisition on SignalTap in case of a division between a negative numerator and a positive denominator. The correct result should be the one of the simulation (0xFF..FFF4FF), but for a reason that I still don't understand, in the hardware implementation the result is 0xFF..FFF4FE. Do you have any suggestion? Is there some kind of rounding in the implementation that is not present in the simulation model?