I know it's wrong because I assign the output of the function to a register that serves as a mask for comparision purposes. However the comparisions were showing too many true results when I ran the hardware, and I realized that the mask had some zeros where only 1's should be (to the right of the left-most 1). However, when I change the index = index * 2 to index = index + 1, I get the correct result.
This tells me that the the synthesis engine is unrolling the loop incorrectly. I think that it's unrolling the loop as:
A: tap | (tap>>1) | (tap>>2) | (tap>>4) | ... // notice tap>>3 is missing
whereas it should unroll the loop recursively as in:
B: tap | (tap>>1) | ((tap | (tap>>1))>>2) | (((tap | (tap>>1))>>2) | (((tap | (tap>>1))>>2)>>4) ... // tap>>3 covered here
Now notice that A and B are equivalent when the loop definition uses index = index + 1 instead of index = index * 2
So basically the question is: are blocking assigns supposed to be truely blocking in for loops? If so, then Quartus has a bug.