Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
11 years ago

Excessive logic utilization from memmove

The following code is similar to a memmove() operation. This code results in a significant increase in logic utilization. The 'rep' member on the struct is defined as an unsigned char. Is there a technique that can be used to avoid the excessive logic utilization from this code? I'm seeing my logic utilization jump by 25% by simply including this function. This occurs regardless of the reqd_work_group_size() value.

void expand( struct number *h ){

int i;

h->size++;

for (i=h->size-1; i>0; i--) {

h->rep[i] = h->rep[i-1];

}

h->rep[0] = 0x00;

}

5 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    It seems that the code works on the private memory, not local/global memory. Each work item has one copy. So the "h->rep" consumes a lot of logic.

    BTW, the code style cannot work efficiently on the FPGA.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    That looks like an operation that should be avoided.

    I'd guess it will always take (at least) 'size' clocks.

    Why not just treat the rep[] array as circular?

    Masking the index with (2^n-1) will be cheap.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Good idea. However, a circular buffer will not work since I'm storing bignums in the array and need to do left/right shift operations over the entire bignum. I've changed the logic to the following by allocating additional memory on both sides of the bignum. This code requires much fewer resources on the FPGA. But this approach requires changes to my left/right shift operations, which are not consuming more resources. Joy.

    void expand( struct number *h ){

    h->size++;

    h->msb--;

    h->rep[h->msb] = 0x01;

    }
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Can you explain why dereferencing "h->rep" consumes a lot of logic? I've tried putting the required work items down to zero and making the kernel a task kernel. Neither approach has any affect on the resource utilization in my code. Admittedly, I'm a newbie to OpenCL and FPGA development. I've read through the Altera optimization guide, but haven't found much in there that explains the magic under the covers.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    You might get an answer if you showed the structure declaration. Also where does the structure reside in memory? (global, local, private?)