You'll need a locking primitive that can be understood by both pieces of code, this effectively means you'll need to write something for the standalone C code and use the same algorithm from withing uCLinux.
Locking is usually based on a locked 'read-write' bus cycle, however the Nios2 cpu and avalon bus don't support them, so you have to do something else instead.
The usual solution is Dekker's algorithm which works provided the memory cycles aren't re-ordered.
Basically you have 1 shared memory location per cpu, to acquire the mutex you:
a) Loop until all the locations are zero
b) Write a 1 onto your own location
c) Check all the other locations are zero - if so you have the lock, if not write back 0, delay a bit and repeat from (a).
The delays during (c) need to be different for the 2 cpus - otherwise they will collide forever [1].
Depending on what you are doing, the 'lock collision' might be treatable as 'nothing to do'.
[1] Ethernet CSMCAD (on coax) uses random backoffs, 2 chips have been known to have synchronised random numbers and backoff forever!