Short vs Int vs Floating Point usage in Kernels
I'm just wondering if anyone else has successfully used smaller fixed point representations for data in their kernels compared to floating point. By that I mean I'm playing around with changing my algorithm from floating point to fixed point integer. This obviously lowered the resource usage as it is a simpler operation for the FPGA, however the issue I'm finding is when I attempt to move smaller than 32-bit integers. I actually only need about 8 bits of resolution for my data so I was hoping to save available logic blocks by changing my kernel data from using 'int' to 'short' and finally 'char', however I'm finding the exact opposite is happening. Whenever I move to a smaller data type, it takes up more resources. I wanted to isolate the issue to make sure there was no overhead from my algorithm so I came up with a few tests and got some weird results. I created a simple kernel to do matrix addition and/or matrix dot product:
# define N_VALUES 100 # define TYPE float typedef TYPE DATATYPE; __kernel __attribute__((reqd_work_group_size(N_VALUES,1,1))) __attribute__((num_compute_units(50))) __attribute__((num_simd_work_items(1))) void DotProductKernel(__global DATATYPE * restrict a, __global DATATYPE * restrict b, __global DATATYPE * restrict result) { int id = get_global_id(0); // Now let's just do the dot product (result[id]) = (a[id]) * (b[id]); } I then amp'd up the number of compute units so that the hardware usage would be significant enough to see and changed the '#define TYPE float' to be 'int', 'short', and 'char'. Here are the results I found during compilation: Data Type Logic Elements Flip Flops RAMS DSPs Logic Utilization % Dedicated Logic Register % Memory Block % DSP % Floating Point Matrix Addition 168561 221098 2200 0 88% 39% 101% 0% Floating Point Matrix Dot Product 155611 206598 2100 50 84% 37% 97% 20% Integer Matrix Addition 141311 181498 2000 0 79% 34% 93% 0% Integer Matrix Dot Product 140661 192398 2000 100 80% 35% 93% 39% Short Matrix Addition 176911 208448 2000 0 89% 37% 93% 0% Short Matrix Dot Product 176511 209248 2000 50 89% 37% 93% 20% Char Matrix Addition 252311 265398 2000 0 109% 43% 93% 0% Char Matrix Dot Product 252111 265798 2000 50 109% 43% 93% 20% Is there some overhead being introduced by the compiler that is using default 32-bit integers and then masking away the excess bits? Or can anyone explain why the resource usage is so much higher for the smaller bit width operations?