According to my calendar, the update release should be available now.
I am still assuming you are talking about "aoc -c" compile time and your parallel load is from global memory. If so, make sure your access pattern is simple. If you are accessing consecutive elements in global buffer and unroll the loop, the compiler will merge your accesses to form wider accesses which reduces the number of loads.
You can also try sending the data to the kernel via channel as you suggested. This may increase resource usage due to channels, but may make compile-time faster because you are splitting your memory accesses in two kernels.
--- Quote Start ---
I think I'm running into the "long compile times due to large loops being unrolled issue." I have a task kernel and synthesis does not complete in a reasonable amount of time if I unroll over 320. When can we expect 14.0 update 1?
"However, in general, users should try to avoid having excessive number of load/store instructions for performance reasons."
If I'm doing a large parallel load, what are my other options? Should I create an I/O kernel that writes to a channel instead?
--- Quote End ---