Good Question. Actually both should work, with index * 2 requiring log_2(n) iterations while index + 1 requiring n iterations.
For example, using index * 2, and an input of 4h'0800, after successive iterations we have:
32b'00000000110000000000000000000000 // index == 1
32b'00000000111100000000000000000000 // index == 2
32b'00000000111111110000000000000000 // index == 4
32b'00000000111111111111111100000000 // index == 8
32b'00000000111111111111111111111111 // index == 16
However, in the end the loop actually produces
32b'00000000110100010000000100000000