3 hours is indeed excessive. The first stage of compilation should only take a few minutes for typical kernels. Based on the line numbers in the log, you seem to have a relatively large kernel. Furthermore, the compiler is auto-unrolling a lot of loops, which might not necessarily be what you want do (especially from a resource usage point of view) while it is also removing some out-of-bound accesses to some of your buffers which shows you have logical issues in your code. I think your kernel is probably too large and complex for the compiler to handle and it is probably running into a memory leak somewhere and filling your memory and finally crashing when it runs out of memory.
My recommendation is to first make sure to modify your code to remove all the warnings in the log and then try to simplify your kernel. As it is, even if your kernel passes the first stage of the compilation, it will probably be too big to fit on the device.