Forum Discussion
Altera_Forum
Honored Contributor
15 years ago --- Quote Start --- I measured 2 clocks for both read and write on a /f without a data cache. --- Quote End --- On /f without data cache I measured 3 clocks per operation for both read and write. So read is one clock faster than dcache-no-burst but write is one clock slower. Anyway, when you have the bulk of the data in SDRAM or in any other external memory working without data cache doesn't sound like a feasable option. --- Quote Start --- I think the slave is generating a wait state - which is very difficult to avoid on the read cycle! --- Quote End --- I don't understand this comment. The slave is my own, I know for sure that it doesn't generate wait states (i.e. every clock it can accept a new address), just one clock of pipeline latency. So all wait states come from either SOPC (unlikely) or from Nios itself. --- Quote Start --- My thoughts in this area are that the nios need not stall on MM transfers. For writes simply using a 'posted write' would be enough to allow most writes to complete in a single cycle (a second transfer would have to stall). For reads it ought, somehow, be possible to cause a 'D' stage stall when the required value is needed instead of an 'A' stage stall. Both these would need to be options - since they will increase the processor size. --- Quote End --- Interesting thoughts, but IMHO purely theoretical. Unfortunately, there is very little chance that Altera is going to significantly redesign /f core. On the other hand, allowing connection of custom components to tightly-coupled data port would take just a very small change in SOPC builder and achieve the same or better improvements for significant class of custom components. --- Quote Start --- It is worth noting that the 'late result' is actually a 'normal result' - and happens when the resultant value has to go via the register file. I think non 'late result' instructions use special logic to forward the output of the ALU back to its inputs. --- Quote End --- Of course, they had to build forwarding logic for single-cycle instructions. However, I don't think that 'late result'='normal result without forwarding'. Even without pipeline bypass results of 'combinatorial' instructions would be available one clock earlier than 'late result'