Thanks for your helpful information. I have checked the Altera Handbook, there are Core Image (contains logic that is programmed by configurationRAM(CRAM)) and Periphery Image (contains general purpose I/Os (GPIOs), I/O registers, the GCLK, QCLK,
and RCLK clock networks, and logic that is implemented in hard IP such as the Hard IP for PCI Express IP Core). So, I think the .aocx file might just reconfigure the Core Image, and the data in the device memory is safe during the hardware switchover.
--- Quote Start ---
While I was asking around for the best method to measure this, I received some information that you are looking for. Using a Linux host, a Stratix V - A7 device takes approximately 750ms to be reconfigured by the runtime. Note this number does not take into consideration the amount of time necessary to move any buffers that are active in the FPGA so whenever possible it's recommended to free any buffers that are allocated in the FPGA before the kernel hardware switchover occurs. Active buffers must be copied up to the host before the hardware is swapped out and restored after the hardware has been replaced, and there is an overhead associated with this, I can't give you a number for this because it's heavily dependent on your software implementation.
If this amount of time is a significant amount of time in comparison to the kernel execution time then you should examine amortizing this cost. Lets say you have a billion data points of data move between kernels "A" and "B" and you handle it a million points (work-items) at a time. Instead of calling up kernel A followed by B for each million points, you would call up kernel A many times to finish off all billion points, followed by kernel B to do the same. That way there is only one swapping out of the hardware instead of a around two thousand hardware swaps (A --> B --> A --> B --> etc...) In situations like these I also try to combine the kernels if possible since not only do you eliminate the hardware swapping in and out, but you often end up with a more efficient hardware implementation because the same compute unit will encapsulate both kernels.
--- Quote End ---