--- Quote Start ---
originally posted by kenland@Oct 18 2004, 09:25 AM
while checking altera's website for the niosii update i noticed the footnote in the core summary table that shifts on cyclone are one clock cycle *per bit*. so our code that strips out individual bytes from long words using shift and mask is running 24x slower than on coldfire. (ouch) --- Quote End ---
The optimizer doesn't recognize multiple-of-8-bit shift-and-mask operations as byte accesses? Oi!
If all else fails, sounds like a plausible application for a custom instruction. You lose portability, though.
Another thing you can do is, for example, typecast a long* to a char* if you have an array, or use a union if you have a scalar. Although those bring out byte-order issues.