Why does this design not fit and how to reason about congestion
Hi all, I'm trying to optimize a simple program which computes a correlation matrix. To promote reuse/parallelism the design proceeds in blocks. However, I do not get the design to route for la...