An iteration scheme in HDL means parallel instantiation of the respective logic, so the below logic is build 652 times, for each function call.
begin
J=i;
n=(2*n)%p;
end
Of course, part of the calculation reduces to constant expressions, but e.g. n depends on the input data, if I understand correct. I don't expect, that the present code can be synthesized in any existing FPGA.
All working examples, that are dealing with similar numeric problems, are using at least partially sequential processing. As a special point, integer generally means 32 bit. In many cases, the compiler is able to recognize the actual required register size and reduce it respectively, but not always.