Forum Discussion
Altera_Forum
Honored Contributor
7 years ago --- Quote Start --- Please attach the *kernel_name*.log and quartus_sh_compile.log files from the compilation folder. --- Quote End --- It seems I ran out of DSPs :(. The DSPs got over utilized at 162%. Apparently, the example was specifically targeted for the D8 chip as opposed to the AB chip. Can I remove the unrolling of the loops to minimize the DSP utilization? Here is a snippet of the code: __kernel __attribute__((reqd_work_group_size(NUM_THREADS,1,1))) void black_scholes( int m, int n, float drift, float vol, float S_0, float K) { // running statistics -- use double precision for the accumulator double sum = 0.0; // loop over all simulations for(int path=0;path<m;path++) { float S = S_0; float arithmetic_average = 0.0f; // We're not including the initial price in the average for (int t_i=0; t_i<n/VECTOR; t_i++) { float U[VECTOR], Z[VECTOR]; vec_float_ty U0 = read_channel_intel(RANDOM_STREAM_0); vec_float_ty U1 = read_channel_intel(RANDOM_STREAM_1); vec_float_ty U2 = read_channel_intel(RANDOM_STREAM_2); vec_float_ty U3 = read_channel_intel(RANDOM_STREAM_3); #pragma unroll vector_div4 for (int i=0; i<VECTOR_DIV4; i++) { U=u0; U[i+1*VECTOR_DIV4]=U1;
u[i+2*vector_div4]=u2; U[i+3*VECTOR_DIV4]=U3;
}
#pragma unroll vector_div2
for (int i=0; i<vector_div2; i++) {
float2 z = box_muller(u[2*i], u[2*i+1]);
z[2*i] = z.x;
z[2*i+1] = z.y;
}
#pragma unroll vector
for (int i=0; i<vector; i++) {
// convert uniform distribution to normal
float gauss_rnd = z; // Simulate the path movement using geometric brownian motion S *= drift * exp(vol * gauss_rnd); arithmetic_average += S; } } It took close to 24-hours to compile the example on a 16-core 3.3Ghz, 128Gig machine! :o Thanks, QG