--- Quote Start ---
Based on your log, 309 RAMs are being used by the BSP, 103 are being used by the channel, and 643 and 83 RAMs for two memory loads.
You cannot change or reduce the amount used by the BSP.
The channel depth you have requested in zero, but the compiler has decided that a depth of 4096 is better for you, hence the high RAM requirement. Channel depth is one of the things that the compiler regularly overestimates, yet there is no way to override it by the user.
The RAMs used for the external memory loads are mostly used for the private cache. You can reduce this amount by adding the "volatile" tag to your __global "coef" buffer. The cache can help a lot if your code does a lot of repeated accesses, but if it doesn't, the cache will be useless and just waste RAMs. There will still be some RAMs used for the access even with volatile tag, and that is because the compiler tries to hide the latency of the memory accesses by putting buffers between the kernel and the memory interface.
--- Quote End ---
channel channel_vec coef_ch __attribute__((depth(0)));
typdef struct{
float data
} vector_line;
typdef struct{
vector_line lane
} channel_vec;
__kernel
void ReadBlock(
uchar dim1,
uchar dim2,
// Data
__global volatile channel_vec *restrict coef)
{
int loc_x = get_local_id(0);
int loc_y = get_local_id(1);
int loc_z = get_local_id(2);
int block_x = get_group_id(0);
int block_y = get_group_id(1);
int block_z = get_group_id(2);
channel_vec coeft_ch_vec;
coef_ch_vec = coef;
write_channel_altera(coef_ch, coef_ch_vec);
I added 'volatile' to this code, but there are no effect for RAMs usage. Is it possible ?