Altera_Forum
Honored Contributor
19 years agoperformance degradation
We are running heterogeneous multiprocessor NIOS2 cores on out Stratix2 fpga. We have made a benchmark that contends for a shared memory locked by a hardware mutex, performs a multiply-accumulate, and stores the data. We are extracting performance data for this benchmark over various configurations of fast and standard NIOS2 processors.
When we have 2 standard processors along with a fast processor contending for the shared memory, we see a drastic performance degradation for just the standards. We then ran three standards with the same benchmark, and found the execution time per processor to increase two-fold over a two standard processor version running the same benchmark. Using performance counters, we saw that the execution time for the load-multiply-accumulate-store of the locked data increased. This seems very strange as once the processor gains the lock, it is the only processor that can operate on the data and the load-multiply-accumulate-store should be the same regardless of the number of processors on the fpga. Does anyone have any reasoning for why execution time of operations like multiplies and adds would increase when we increase the number of processors on the fpga from 2 to 3? Or are there any problems with performance counters running on multiple processor systems simultaneously?