I have a lot of trouble understanding how a designer can ever comprehend the complexities being created. multicore and parallel programming are full speed ahead with very little explanation of how well it will work. Here's my take on multicore, Intel saw maybe half the processor cucle were spent waiting so dual core came along and improved performance(how much?). Well if 2 is good, many, many will be better. But what if two used up all the cycles, did 3 and 4 steal cycles from 1 and 2 causing them to slow down? I have read in some forum that multicore actually does run slower for unknown reasons.
Sure parallel programming is a part of supercomputers, but look at the number of processors.
In the case you mentioned if something doesn't work quite right, does the designer have to use timing simulation to debug logic never before seen?
While I wish them luck, I fear it may just be an academic exercise. There was a Microsoft research into why FPGAs are so fast compared to a processor and they found the processor spent a lot of time fetching instructions. They missed the point that you mentioned, parallelism in the FPGA. My design uses that parallelism very much because several rams, counters, etc are used in parallel. Please think about the demo code running in 48 cycles and maybe just guess at what it would take in HDL. That is the margin of performance to be gained at whatever cost.
My design could be put to use quickly while that research could continue in parllel.
Thanks again, I enjoy a good discussion.