ok, now I understand a little bit better. From which University do you come from?
a few comments:
- are you sure the code you are going to execute fits in RAM?
- since the code should be small, try to put the three codes on three tightly coupled or onchip rams, or simply use different memories (ext_ram, sdram, and onchip) for the three processors, to remove possible contentions on the same memory
- are you measuring just one processor, ar all three? Altera Mutexes implement simple spin locks so there is no bounded access time for the wait time on an altera mutex (and that's the reason why we implemented a queuing spin lock on ERIKA Enterprise)
- it could be that using fast and standard cores accessing the same memory maybe leads to more contentions that three standards, because the pattern of memory accesses could be in general different
- I do not understand the performance counter setup...
why not
PERF_BEGIN(PERFORMANCE_CPU1_BASE, 3);
altera_avalon_mutex_lock(mutex, 1);
PERF_END(PERFORMANCE_CPU1_BASE, 3);
// --- moved
PERF_BEGIN(PERFORMANCE_CPU1_BASE, 2);
// --- moved
temp = IORD(&macarray
,0);
iowr(&macarray,0, temp*i + temp);
altera_avalon_mutex_unlock(mutex);
PERF_END(PERFORMANCE_CPU1_BASE, 2);
??
- are you -really- sure the various CPUs are competing to access the mutexes? I do not see any code that synchronizes the various CPUs on the start of the cycle... That is in general a common problem, because each CPu has its own boot time. If you want to be sure, you probably need to use something like the startup barrier we implemented in ERIKA Enterprise.
bye
PJ