Forum Discussion
Altera_Forum
Honored Contributor
11 years agoI'm basically searching for 16byte elements in a hashset ~8GB big. What I did for now to speed up collision resolution is put a maximum of 7 spots to check for the elements. These 7 spots are calculated right from the start and the memory accesses are calculated independently, such as this:
Elements = hashset
Elements = hashset
...
Elements = hashset
Then, I verify which element is the right answer. So I expect all memory accesses to cause only one pipeline stall overall, since they are done in parallel because of independence. According to the aocl report, the pipeline stall percentages for each of the memory accesses are different, but all hovering around ~75%. Do you think the stalls are indeed combined?