Hi,
--- Quote Start ---
How are you getting around the problem that the nios has no locked bus cycles - so you can't implement any mutex or other atomic operations into normal memory?
Typically these need a minimum of a locked 'compare and exchange' instruction - which the avalon bus doesn't support.
--- Quote End ---
I implemented an instruction which swaps the values between the register and memory operand in atomic, like
swap ra, imm16
.
Spinlocks are using this. And the 'compare and exchange' instruction is implemented with the combination of spinlocks like
static inline int atomic_cmpxchg(atomic_t *v, int old, int new)
{
int ret;
unsigned long flags;
_atomic_spin_lock_irqsave(v, flags);
ret = v->counter;
if (likely(ret == old))
v->counter = new;
_atomic_spin_unlock_irqrestore(v, flags);
return ret;
}
in the Linux kernel.
--- Quote Start ---
If you've re-implemented nios, you might notice that nios is basically a reimplementation of MIPS :-)
--- Quote End ---
Of course, I know that Nios2 is a copy of ....:D
--- Quote Start ---
I understand that you created a NIOS compatible CPU yourself. Of course here you can in fact implement such instructions, but I feel that a NIOS clone (done in Verilog or whatever) will be much slower than an Altera branded thingy that uses low-level optimizations that Verilog and friends don't provide.
--- Quote End ---
Of course, clone core's fmax is a big problem if you want to achieve better performance than single core case. I compiled the source for my DE2-115 with switches 'less Optimizations' and 'Fast fit', and got the fmax result around 60MHz. By optimizing the details and compilation switches, may be I can get 75~80MHz, but over 100MHz is impossible.
--- Quote Start ---
I agree that implementing a MIPS clone seems more appropriate than implementing a NIOS clone There are some free 32 Bit CPUs in Verilog code available in the Net.
--- Quote End ---
Why Nios2? Because I love Nios2:) and Altera:D.
--- Quote Start ---
Of course cache synchronization is a huge task to do.
--- Quote End ---
To achieve the cache coherency, I implemented the 1st data cache as 'write through' one, and set the cache flush signals which are sent to other cpu's data cache whenever the write operations are done. If the target data has been cached in other caches, it's only flushed from the cache. The new data is filled at the next memory access.
Kazu