Short vs Int vs Floating Point usage in Kernels

Honored Contributor

11 years ago

--- Quote Start ---

The boolean operator case is different. Because of the logical dependence, the second load operation has a control dependence on the first one. This uses a different (and more expensive) type of load/store unit.

Yes, loading/storing larger types (int, or char4) would solve the alignment problem at the expense of wasted memory.

--- Quote End ---

Why would there be wasted memory if you're packing 4 char values into an integer? Or do you mean wasted memory in terms of logic elements used to convert (mask) from the 32 bits down to the chars and back.

I was curious so I expanded my experiment to the vector data types (char4) as well as the solution of packing 4 chars into a 32-bit integer. The vector solution is attached as matrixmult_char4.txt (I couldn't upload a .cl file for some reason). The packing into 'int' solution is attached as matrixmult_int.txt.

Compiling these for the above tests (dot product and simple addition) I get the following:

Data Type

Logic Elements

Flip Flops

RAMS

DSPs

Logic Utilization %

Dedicated Logic Register %

Memory Block %

DSP %

Char Matrix Addition Compact

143611

181098

2000

79%

34%

93%

Char Matrix Dot Product Compact

164211

271998

2000

200

94%

44%

93%

78%

Char4 Vector Matrix Addition

141311

181498

2000

79%

34%

93%

Char4 Vector Matrix Dot Product

141461

192698

2000

200

80%

36%

93%

78%

I can't exactly explain why the DSP increased for the dot product other than the idea that there are 4 more multiplications in each kernel compared to the int version. However, the results do favor Outku's explanation of the load/store alignments.

--- Quote Start ---

In the optimization guide there is a section on fixed point operation, page 14, which suggests statically masking your 32-bit integers to the desired precision. If I understood this right, the AOC would be able to disregard the extra bits during hardware generation thus will reduce the amount of logic (minimal in this case) but still a reduction. You may want to look into that and see if it helps you out any. I was curious so I ran the example listed in the guide for 17-bit precision. I had an increase in logic for the fixed point version over the straight 32-bit version and I don't think this should be the case. I wouldn't think the load/store units that access the memory wouldn't be an issue as Outku suggested in the case of the original poster. Any insight would be appreciated.

Thanks,

Rudy

--- Quote End ---

Thanks for the suggestion. I did attempt to use static masks on larger data types to get the 8bit (char) and 16bit (short) examples, however I ended up with the same result. It fixed the alignment issue, but there was a lot of wasted space with loading/storing all 32-bits and only using 8 bits. Using the vector data types, though, seems to have solved the issue.

Thanks everyone for your help!

multiple-attachments.zip1 KB

Forum Discussion

Short vs Int vs Floating Point usage in Kernels

Recent Discussions

Generate Simulation Setup Script Fails

FIR IP configured for Interpolation

Altera SSLC License

Lisence issue when running .do script

How to create a Packaged Subsystem in TCL