Forum Discussion
8 Replies
- Altera_Forum
Honored Contributor
I use DE2 so I'm not sure about DE2 70, but if it has the Cyclone II Device, I would say, probably not. Depends on what other peripherals you want, but even with minimal peripherals, I would think it is hard to fit 16 Nios II cores.
When you compile you project in Quartus, you get a resource utilization summary, so that might help you to get an idea how much resources you have left. And I don't think they have 16*4K on-chip memory on Cyclone II. - Altera_Forum
Honored Contributor
1,125 kbits on a 2C70
16 cpu x 4 kByte x 8 bit/Byte = 512 kbit LE wise 70 kLE / 16 CPU = ~4.4 kLE/CPU sounds reasonable, go try it out. :) - Altera_Forum
Honored Contributor
Oh, ok! I was obviously misinformed about the number of M4K blocks on the device. This document has that info: http://www.altera.com/literature/hb/cyc2/cyc2_cii51008.pdf Thanks, thepankcake.
I would be interested to know if you could fit 16 cores on your device, when you try it. - Altera_Forum
Honored Contributor
--- Quote Start --- sounds reasonable, go try it out. :) --- Quote End --- I will, thank you! I've just ordered my DE2-70 so it will be a while ... - Altera_Forum
Honored Contributor
The processor core uses two M4K blocks if I remember correctly for the register file. The debugger module uses one as well so you might need to go light on dedicated on-chip memories in your design. You have external memory interfaces on that board so I recommend dropping some pipeline bridges into the system if you attempt to hook up 16 processor cores to the memory otherwise the Fmax of your system will take a pretty hefty hit.
See "Increasing System Frequency" in this doc for more information about where to drop the bridges down in your design: http://www.altera.com/literature/hb/nios2/edh_ed51007.pdf - Altera_Forum
Honored Contributor
--- Quote Start --- You have external memory interfaces on that board so I recommend dropping some pipeline bridges into the system if you attempt to hook up 16 processor cores to the memory otherwise the Fmax of your system will take a pretty hefty hit. --- Quote End --- I wouldn't expect to connect 16 processors to the same external memory. What I have in mind is a systolic array pattern where data flows between neighbouring processors, and only one or two cpus at the edge need to see the outside world. But thanks for the pointer to the system frequency paper, which looks very useful. - Altera_Forum
Honored Contributor
I've got my DE2-70 now and have tried some experiments.
The hardware does have plenty of capacity for multiprocessors: a ring of 16 Nios2/e cpus, each with 4KB on-chip RAM and a FIFO connecting to its neighbour, takes 17035 LEs and 540672 bits of memory -- 25% and 47% respectively of EP2C70 resources. Using the Nios2/s increases resource usage to 47% of LEs and 54% of memory (the extra RAM goes into instruction caches). The software, on the other hand, seems to have trouble coping with a design of this size: SOPC builder takes 2 hours 45 minutes to generate the system from a total of 81 modules. This is with Quartus 9.1 web edition on Linux -- is this version deliberately crippled to encourage sales of the subscription edition, or is SOPC builder just a toy for making systems of just a few components? For a simple regular design like this, it seems the workaround would be to use SOPC builder to make a single unit consisting of one cpu with local memory and a FIFO with an exported avalon slave connection at the top level, and then bolt together multiple instantiated copies of the unit in Verilog. Is that the usual procedure for large designs? - Altera_Forum
Honored Contributor
--- Quote Start --- For a simple regular design like this, it seems the workaround would be to use SOPC builder to make a single unit consisting of one cpu with local memory and a FIFO with an exported avalon slave connection at the top level, and then bolt together multiple instantiated copies of the unit in Verilog. --- Quote End --- That worked well. To see how far the resources would stretch, I built a system of 83 Nios2/e processors with 512 bytes of local RAM each. SOPC builder needed only about 3 minutes to generate one unit. Synthesis took another 4.5 hours, but fitting that much logic into a chip is probably quite challenging. Why 83 cpus? It seems the important limit is not memory bits but memory blocks. A Nios2/e (without JTAG debug) needs two M4Ks. Allowing another M4K per cpu for local RAM, 83 cpus will use 249 of the available 250 memory blocks. Curiously, the final report says 424,960/1,152,000 (37%) of memory bits used, which suggests that the two M4Ks used internally by the Nios2/e are not very full. Is there a Nios2 architecture document anywhere which might explain this further?