OK I understand the IOWR inefficiency, and thanks for the new code. I don't think I would have been able to do that on my own so quickly... pointers and me tend not to go well together.
Now suppose I have a 2-D array of alt_u8 of size 4 x 10, which represents an image. Then I want to apply a convolution filter to it by 'sliding' a 3x3 window on it to collect 8 values each time to send to hardware to do its calculation. Is there a smart way to use pointers here also as data are already packed in memory in the correct order with this also. I could not apply your suggested way and ended up doing it your first way (see code below).
alt_u8 array_image; // fill in values
for(Y=0; Y<(4-2); Y++) {
for(X=0; X<(10-2); X++) {
// hardware calculation
IOWR_ALTERA_AVALON_PIO_DATA(DATAA_OUT1_32BITS_BASE,(array_image<<24)|(array_image<<16)|(array_image<<8)|array_image);
IOWR_ALTERA_AVALON_PIO_DATA(DATAA_OUT2_32BITS_BASE,(array_image<<24)|(array_image<<16)|(array_image<<8)|array_image);
// results back from hardware
store_array= IORD_ALTERA_AVALON_PIO_DATA(RESULTA_IN_8BITS_BASE);
}
}
I tried using the pointers method but am I right to say that I will need to re-assign 8 values to the alt_u8 array8b inside the two for-loops at each iteration loop and hence require more cpu work?
I have new questions concerning IOWR.
-When I have two IOWRs sequentially as in my code above, do these two 'writes' happen at the same time, i.e. does the hardware module who is waiting fot the data get them all at once ?
- To increase calculation speed, my next step will be to instantiate 2 calculation modules in my top level .v file, and modify the for-loops such that my image is divided into 2 (i.e. 4x5 and 4x5), and data from each sub-image goes to its respective hardware module instantiation. To do this I will need another set of PIOs (2x32bits out and 1x8bits in). Is this the right way to do it? If keep dividing my big image into more sub-images e.g 4, do I add another 2 sets of PIOs? I will end up having many sets of PIOs if my image is bigger and I want to sub-divide more!
Is there a better way to transfer the data from NIOS to the different hardware modules? Another forum member has suggested to create a wrapper module for all my instantiations and create an Avalon MM to interface to the array, but I am completely lost! Can somebody please guide me a bit through this?