Doing low bit-width fixed precision FMA on DSP in OpenCL
Hi,
I'm developing a specific design, where my variables are either 8 or 16 bit fix precision. For example, data are stored as `char` or `short`. Now each iteration, multiple FMAs are being done, but the design does not utilize the DSP.
Something tells me that there should be a way to offload these computations onto the DSPs and open up more space for scalability of the design. Unfortunately, I have no idea how it could be done in OpenCL. Even the OpenCL documentations do no provide any information.
My first question is, does such thing ever possible? My assumption is, it can help doing multiple FMAs on a single DSP!
Second, if it is possible, is there any specific documentation on how this can be done in OpenCL?
Thanks
The appropriate IP Core is either directly used by the OpenCL compiler, or eventually employed by the mapper depending on the width of your variables. I remember there were some topics in the forum about this subject before and the compiler's behavior was kinda buggy, though. You can also take a look at the variable-width integer extension and the instructions for correctly inferring fixed-point arithmetic in the Best Practices Guide.