Forum Discussion
Altera_Forum
Honored Contributor
10 years ago --- Quote Start --- The boolean operator case is different. Because of the logical dependence, the second load operation has a control dependence on the first one. This uses a different (and more expensive) type of load/store unit. Yes, loading/storing larger types (int, or char4) would solve the alignment problem at the expense of wasted memory. --- Quote End --- Why would there be wasted memory if you're packing 4 char values into an integer? Or do you mean wasted memory in terms of logic elements used to convert (mask) from the 32 bits down to the chars and back. I was curious so I expanded my experiment to the vector data types (char4) as well as the solution of packing 4 chars into a 32-bit integer. The vector solution is attached as matrixmult_char4.txt (I couldn't upload a .cl file for some reason). The packing into 'int' solution is attached as matrixmult_int.txt. Compiling these for the above tests (dot product and simple addition) I get the following: Data Type Logic Elements Flip Flops RAMS DSPs Logic Utilization % Dedicated Logic Register % Memory Block % DSP % Char Matrix Addition Compact 143611 181098 2000 0 79% 34% 93% 0% Char Matrix Dot Product Compact 164211 271998 2000 200 94% 44% 93% 78% Char4 Vector Matrix Addition 141311 181498 2000 0 79% 34% 93% 0% Char4 Vector Matrix Dot Product 141461 192698 2000 200 80% 36% 93% 78% I can't exactly explain why the DSP increased for the dot product other than the idea that there are 4 more multiplications in each kernel compared to the int version. However, the results do favor Outku's explanation of the load/store alignments. --- Quote Start --- In the optimization guide there is a section on fixed point operation, page 14, which suggests statically masking your 32-bit integers to the desired precision. If I understood this right, the AOC would be able to disregard the extra bits during hardware generation thus will reduce the amount of logic (minimal in this case) but still a reduction. You may want to look into that and see if it helps you out any. I was curious so I ran the example listed in the guide for 17-bit precision. I had an increase in logic for the fixed point version over the straight 32-bit version and I don't think this should be the case. I wouldn't think the load/store units that access the memory wouldn't be an issue as Outku suggested in the case of the original poster. Any insight would be appreciated. Thanks, Rudy --- Quote End --- Thanks for the suggestion. I did attempt to use static masks on larger data types to get the 8bit (char) and 16bit (short) examples, however I ended up with the same result. It fixed the alignment issue, but there was a lot of wasted space with loading/storing all 32-bits and only using 8 bits. Using the vector data types, though, seems to have solved the issue. Thanks everyone for your help!