Without seeing the kernel I can only think of a few reasons for the performance degregation but it would be very low level information that I don't think you'll benefit from so I'll just say you do not need this particular optimization when targetting the Altera device.
I'm not sure of the reason why that's an optimization on a GPU but I'm guessing it is branch prediction related. Often GPU optimizations are not necessary when targetting the FPGA since the hardware is being generated accordingly for the kernel, instead of the other way around where you are trying to make your kernel fit in the underlining archeticture for GPUs and other ASICs. In general my recommendation is any time you work with a kernel that has been optimized for a GPU and you recognize these optimizations try undoing them when you target the FPGA since optimizations for one device may not necessarily help on another device (same is true if you ported a kernel from a FPGA to a GPU).