Are you trying to perform part of your computation on a GPU using CUDA, and then pass the output to an FPGA using OpenCL? If this is the case, I wouldn't expect it to work at all since you are mixing libraries with completely different characteristics. Since OpenCL works just fine on GPUs, I recommend porting everything to OpenCL first on a GPU, and then trying to port it for FPGAs.
Also if you are using "clEnqueueWriteBuffer" to write your host buffer to device, you shouldn't use "CL_MEM_USE_HOST_PTR" when creating the device buffer; the latter is for when you do not want to explicitly copy the buffer from host to device, and let the OpenCL runtime to decide when or how to do the transfer. This is mostly useful for targeting CPUs to avoid allocating two copies of the same buffer in host memory (which is the same as device memory in this case).