Well, conceptually you can something like this:
alt_u8 array[8] = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08 };
Then
*((alt_u32*)array) returns 0x04030201
*((alt_u32*)(array+1)) returns 0x05040302
*((alt_u32*)(array+2)) returns 0x06050403
...
and so on
For clarity, since you are not used to handle pointers, consider that
*((alt_u32*)(array+n)) is the same as *((alt_u32*)&(array[n]))
In practical, I can't remember if Nios requires the 32bit alignment in order to do it as supposed. This depends from the data bus architecture.
Being this true or not, you'll have very different performance, because in one case the processor can do the bytes to 32bit packing with a single memory access, while in the other it must still handle the single bytes as you do now.
Since you already measure cpu time, you can simply put this in your code and test if you obtain an improvement.