Forum Discussion
Altera_Forum
Honored Contributor
8 years agoI did more tests, you are right about the compiler combining the stores in the if/else branches, that's why it have two layers and cascading write units have large latency.
If both branches have 9 writes it's would use one 1024bits write, thought more bandwidth usage but the latency is small and that's what I care about. But in my case the 9th write in the else branch is a "void write" and may cause wrong output if kernel execution ordering is not sequential (not sure if it is, I think I read it somewhere that it's not guaranteed, but it is sequential in my experience) I wish programmer could just specify "don't try to share write unit between branches" or "use largest bandwidth necessary in any one branch"