Actually I am not sure what would happen when you have a for loop in each of the branches. In your case, since the loop trip counts are known and are the same, the compiler probably just needs to make sure the latency of one iteration of each of the loops is the same and then the total latency of execution of both loops will also be the same. However, for cases where the trip counts are not the same or not known at compile-time, this certainly cannot be done. I would assume in such cases the branch is handled in the same way as when you have stallable accesses in one of the sides of the branch. Likely the compiler makes sure the minimum latency of both paths is the same (for a loop trip count of one) and if the two paths did not finish at the same time, the faster side will be stalled until the slower side finishes.