To consolidate accelerated functions I attempted to make the VkSDV function more general by using a smaller shift ... 12 instead of 22 ...
long long VkSDV(int mean, alt_u8 NumPoints)
{
int * QXP = QXvector;
long long SumSquare = 0;
int val;
do {
val = *QXP++ - mean ;
SumSquare += ( (long long)val * (long long)val )>>12;
} while (--NumPoints > 0);
return SumSquare;
}
Since the number of points to square and sum is smaller than 256, any further scaling can be done with the returned long long result.
Unfortunately the C2H version overflows. Using my test data the result using shift 22 is 0x01234945 ... which fits in 32 bits.
The result using shift 12 is 0x48D251535 ... which overflows 32 bits.
Since the data path is 64 bit, the software implementation works but the C2H implementation fails. I did check the accelerator_VkSDV.v file and tried to find a place where the 64 bit data is restricted. No luck, all variables used were 64 or 128 bit. Also tried your seperated Square and SumSquare version.
I also verifed this result on two different dev kits. One of them is a simplified SSRAM and Onchip build of the 3C25 starter kit. I could submit that one to mysupport.