It might even be as simple as some of your code ends up sharing cache lines.
You won't get fully deterministic behaviour unless you put all the code/data in tightly coupled memory and disable the dynamic branch predictor.
Once you've done that the execution time of code is independent of any external state and can be counted.
you could tryusing:
if (__builtin_expect(some_condition,1))
that will make the 'true' part of the code the fallthrough path.
If 'some_condition' is non-trivial you'll need to add it to the correct part.
Get gcc to generate a .s file (with -S --verbose-asm) and look at the object code.