Do you still get incorrect output after removing the ivdep pragmas? Also, as I mentioned before, there is really no point in fully unrolling your memory reads and writes since the memory bandwidth will be saturated with an unroll factor of 16, and you will be just wasting FPGA area with such large unroll factors.
It is unlikely that your problem is caused by a bug in the compiler; however, if it is, there is nothing any of us can do about it other than reporting it to Intel and hoping that they would fix it in a later version. It might also be possible to avoid bugs in certain cases by changing the design strategy.