Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
16 years ago

C2H fixed point mpy

I had an array of fixed point numbers, log values actually. 10 bit integer 22 bit fraction. Needed the sum of squares to calculate standard deviation.

In regular C this works okay ...

 
long long VkSDV(int mean, alt_u16 NumPoints)
{
    int * QXP = QXvector;
    long long SumSquare = 0;
    int val;
    do {
        val = *QXP++ - mean ;
        SumSquare += ((long long)val * (long long)val)>>22;
    } while (--NumPoints > 0);
    return SumSquare;
}

But that is not supported in C2H.

--- Quote Start ---

Actually it is supported ... false correlation when some other problem tricked the compiler into apparent complaint about the long long cast. Nevermind.

--- Quote End ---

A C2H workaround that seems to work is ...

 
    do {
        val = *QXP++ - mean ;
        SumSquare += val * val;
    } while (--NumPoints > 0);

Which produces overflow results ...

Then I opened accelerator_Kit2C70_VkSDV.v

Located the point where the multiplier is input .

--- Quote Start ---

assign accelerator_Kit2C70_VkSDV_multiplier_resource0_res0 = lpm_multiply_result0[31 : 0];

--- Quote End ---

As you can see it limits the 64 bit result to the lowest 32 bits …

Modified it like this …

--- Quote Start ---

assign accelerator_Kit2C70_VkSDV_multiplier_resource0_res0 = lpm_multiply_result0[53 : 22];

--- Quote End ---

To implement a >> 22 shift …

Compiled the Quartus project, tried it and it produced the same result as the regular C function.

5 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Try this:

    Square = (long long)val * (long long)val;

    SumSquare += Square >> 22;

    By the looks of it the CPLI of your loop is 1 before and after this change so your performance shouldn't differ.

    Can you open a mysupport service request and attach your design so that this can be tracked and resolved. The result of your multiplication should have been a long long and the upper bits shifted down like you intended.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I tried your modification and it worked okay, so I retested the original version and that worked also. As noted in my edit above, some other problem generated a false complaint about the long long cast.

    Does the compiler optimize away the unnecessary multipliers or does it genereate a 64 x 64 -> 128 multiplier?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    C2H infers multipliers starting in 8.1 I believe. So it will be up to Quartus II synthesis to determine whether additional bits can be stripped away. To find out if the additional bits were removed you can look at the resource utilization and determine if a full 64x64 multiplier was used.

    Sometimes you can force the bit stripping to happen using masks. For example if you knew you only need a 9x9 multiplier you could do this:

    long result;

    short a, b;

    result = ((long)(a & 0x1FF)) * ((long)(b & 0x1FF));

    Even though the inputs are 16 bit cast up to 32 bit, the data all but the lower 9 bits masked away. Synthesis should detect this and create a 9x9 multiplier and pad the upper bits to 0 before assigning the 18 bit result into 'result'. This same trick should work if you mask the output of the multiplication as well.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    To consolidate accelerated functions I attempted to make the VkSDV function more general by using a smaller shift ... 12 instead of 22 ...

     
    long long VkSDV(int mean, alt_u8 NumPoints)
    {
        int * QXP = QXvector;
        long long SumSquare = 0;
        int val;
        do {
            val = *QXP++ - mean ;
            SumSquare += ( (long long)val * (long long)val )>>12;
        } while (--NumPoints > 0);
        return SumSquare;
    }
    

    Since the number of points to square and sum is smaller than 256, any further scaling can be done with the returned long long result.

    Unfortunately the C2H version overflows. Using my test data the result using shift 22 is 0x01234945 ... which fits in 32 bits.

    The result using shift 12 is 0x48D251535 ... which overflows 32 bits.

    Since the data path is 64 bit, the software implementation works but the C2H implementation fails. I did check the accelerator_VkSDV.v file and tried to find a place where the 64 bit data is restricted. No luck, all variables used were 64 or 128 bit. Also tried your seperated Square and SumSquare version.

    I also verifed this result on two different dev kits. One of them is a simplified SSRAM and Onchip build of the 3C25 starter kit. I could submit that one to mysupport.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Don't know if this is a real bug. An older VHDL based design required a speedup in the data resampling code. Since I have the C2H license I decided to implement the function that way. One way to fit a large number of polyphase resampling filters into onchip_mem is to read half of them in reverse order, which reduces the number of filters required by one half.

    My first attempt, which works fine using software, compiled okay but produced invalid results.

     
    // Resample data vector with polyphase filter
    // r is input data pointer, p is polyphase filter pointer
    int ReSample(int* __restrict__ r, int* __restrict__ p, alt_u16 PolySel)
    {
      # pragma altera_accelerate connect_variable ReSample/r to sdram
      # pragma altera_accelerate connect_variable ReSample/p to onchip_mem
        long long SumVal = 0;
        char flag = (PolySel & 0x100)>>8; // MS bit determines coeff direction
        int k = 20;
        do {
            SumVal += (long long)*r++ * (long long)*p ;
            (flag) ? p-- : p++ ;  // scan reverse or forward
         } while (--k);
         return (int)(SumVal >> 22);
    }
    

    After some testing I determined that the filter pointer p was not being incremented or decremented. Experimenting with workaround code finally reulted with a version that operates properly and generates the same result as software.

     
        do {
            SumVal += (long long)*r++ * (long long)*p ;
            if (flag) p-- ;  // scan reverse
            else p++ ;        // scan forward     
         } while (--k);
    

    I have not had a chance to check this with a verilog design or any other hardware so I don't know how solid this is.

    --- Quote Start ---

    (edit) This is explained in http://www.altera.com/literature/rn/rn_nios2eds.pdf

    "The C2H Compiler always evaluates both operands of logical (&&, ||) and

    conditional (?:) operators."

    So the ? conditional result is both increment and decrement, no change.

    --- Quote End ---