I'm not sure if resources are freed instantly or in lazy fashion. One thing to keep in mind is that on the host there is a shadow buffer that gets allocated so you might be running out of memory. Aside from the memory you allocate on the host side this shadow buffer is created so that there is space allocated up on the host to store/restore buffers from the FPGA in case the hardware needs to get swapped out. So if you create a 512MB block of data for the kernel to operate on you'll take up 512MB on the target, 512MB that was allocated by the host program, and 512MB on the host reserved for the shadow buffer.
It's possible that there could be a trip through the host code that misses the release calls so it might be worth putting some breakpoints in to ensure you are not allocating resources that are not eventually freed later. Since the SDK adheres to the 1.0 standard sometimes the error codes are not very descriptive of the actual problem. Which calls are you seeing the out of resources/host memory error codes returns?