Forum Discussion
Altera_Forum
Honored Contributor
10 years ago --- Quote Start --- The width of the arithmetic operations impacts the DSP usage as seen in your experiments. The difference in ALM usage (i.e. logic) is not in the kernel datapath, it is in the load/store units that access the memory. The alignment of loads/stores impacts the ALM usage. With char* pointers, each load/store access is only 1-byte aligned and this does not allow much optimization. With short* pointers, each address is 2-byte aligned (i.e. the least significant address bit is zero) and this allows Quartus to perform some optimizations. The difference for each load/store unit is a few hundred ALMs (depends on the alignment). With 3 load/store * 50 copies, this overhead becomes big, considering there is nothing else in the kernel. --- Quote End --- That would also explain why the same algorithm with the boolean operator exploded in size. Would it be better to optimize fixed point kernels by loading/storing them as 32-bit integers (as 4 chars packed together) and then separating them only for the internal arithmetic of the kernel to keep the alignment at 4 bytes? Or would a char4 vector data type accomplish the same task?