I've NFI what those altera_avalon mutex functions do, but at best they look like bloat.
For synchronisation you need a memory block that is accessible uncached by both processors - this could be tightly coupled memory on both cpus. Since you'll need to carefully flush caches on any shared memory, the same memory can be used for all inter-cpu communications. On the linux side you probably have to go through several layers of code to access specific physical addresses - but I assume you've sorted that out already.
I would use a memory section for the shared area, and get the linker to assign variables to that area. This tends top make the coding much less error prone.
For a single mutex both sides could then run something like:
unsigned int interlock __attribute__((".data.shared")) = {0, 0};
void get_interlock(int cpu);
{
int i;
for (;;) {
while (interlock != 0)
continue;
interlock = 1;
if (interlock == 0)
return;
interlock = 0;
for (i = (cpu + 1) * 8; i != 0; i--)
continue;
}
}
void release_interlock(int cpu)
{
interlock = 0;
}