SBioo
Occasional Contributor
7 years agoKernel compilation failure for OpenCL FPGA.
I have this OpenCL code that I'm compiling using OpenCL compiler 16.0-pro. Here is the code:
//
// (c) December 19, 2018 Saman Biookaghazadeh @ Arizona State University
//
#ifdef INT_PRECISION
#define DTYPE int
#elif SINGLE_PRECISION
#define DTYPE float
#elif DOUBLE_PRECISION
#pragma OPENCL EXTENSION cl_khr_fp64: enable
#define DTYPE double
#endif
#include "funcs.h"
channel DTYPE c0;
__kernel void mm_k1 (__global const DTYPE* restrict A,
__global const DTYPE* restrict B,
__global const DTYPE* restrict C,
__global DTYPE* restrict D,
const DTYPE alpha,
const DTYPE beta
#ifdef FPGA_SINGLE
,const int lll)
#else
)
#endif
{
#ifdef GPU
#endif
#ifdef FPGA_SINGLE
for (int i = 0; i < lll; i++) {
for (int j = 0; j < lll; j++) {
DTYPE temp = 0.0;
for (int z = 0; z < lll/BLOCK_SIZE; z++) {
DTYPE A_local[BLOCK_SIZE];
DTYPE B_local[BLOCK_SIZE];
DTYPE local_temp = 0.0;
// Coalescing memory read from the memory section "A"
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
A_local[k] = A[i*lll+z*BLOCK_SIZE+k];
}
// Coalescing memory read from the memory section "B"
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
B_local[k] = B[i*lll+z*BLOCK_SIZE+k];
}
// Accumulating the result of multiplications
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
local_temp += A[k] * B[k] * alpha;
}
// final accumulation
temp += local_temp;
}
write_channel_altera (c0, temp);
}
}
#endif
}
__kernel void mm_k2 (__global const DTYPE* restrict A,
__global const DTYPE* restrict B,
__global const DTYPE* restrict C,
__global DTYPE* restrict D,
const DTYPE alpha,
const DTYPE beta
#ifdef FPGA_SINGLE
,const int lll)
#else
)
#endif
{
#ifdef GPU
#endif
#ifdef FPGA_SINGLE
for (int i = 0; i < lll; i++) {
for (int j = 0; j < lll; j++) {
DTYPE temp = read_channel_altera(c0);
#pragma ivdep
for (int z = 0; z < lll/BLOCK_SIZE; z++) {
DTYPE C_local[BLOCK_SIZE];
DTYPE D_local[BLOCK_SIZE];
// Coalescing memory read from the memory section "A"
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
C_local[k] = C[j*lll+z*BLOCK_SIZE+k];
}
// Initializing the memory section "D"
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
D_local[k] = 0.0;
}
// Accumulating the result of multiplications
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
D_local[k] += temp * C_local[k];
}
// final accumulation
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
D[j*lll+z*BLOCK_SIZE+k] += D_local[k];
}
}
}
}
#endif
}I compile the code using the script:
aoc -W -v -DSINGLE_PRECISION=true -DFPGA_SINGLE=true -DBLOCK_SIZE=32 --report --board p385a_sch_ax115 2mm.cl -o bin/mm_BS32.aocxFor some reason, the code fails for the compilation. There is no specific error for the compilation failure. I have added the quartus_sh_compile.log log:
Info: *******************************************************************
Info: Running Quartus Prime Shell
Info: Version 16.0.2 Build 222 07/20/2016 SJ Pro Edition
Info: Copyright (C) 1991-2016 Altera Corporation. All rights reserved.
Info: Your use of Altera Corporation's design tools, logic functions
Info: and other software and tools, and its AMPP partner logic
Info: functions, and any output files from any of the foregoing
Info: (including device programming or simulation files), and any
Info: associated documentation or information are expressly subject
Info: to the terms and conditions of the Altera Program License
Info: Subscription Agreement, the Altera Quartus Prime License Agreement,
Info: the Altera MegaCore Function License Agreement, or other
Info: applicable license agreement, including, without limitation,
Info: that your use is for the sole purpose of programming logic
Info: devices manufactured by Altera and sold by Altera or its
Info: authorized distributors. Please refer to the applicable
Info: agreement for further details.
Info: Processing started: Wed Dec 19 13:28:43 2018
Info: Command: quartus_sh -t scripts/pre_flow_pr.tcl compile top top
Info: Quartus(args): compile top top
Info: Using INI file /mnt/saman/shoc-fpga/src/opencl/PolyBench/bin/mm_BS32/quartus.ini
Info: Running pre-flow script
Info: Project name: top
Info: Revision name: top
Info: Checking for OpenCL SDK installation, environment should have ALTERAOCLSDKROOT defined
Info: ALTERAOCLSDKROOT=/home/users/saman/16.0-pro/hld
Info: Compiling import revision -> nothing to be done here!
Warning (125092): Tcl Script File board/board.qip not found
Info (125063): set_global_assignment -name QIP_FILE board/board.qip
Warning (125092): Tcl Script File kernel_system/kernel_system.qip not found
Info (125063): set_global_assignment -name QIP_FILE kernel_system/kernel_system.qip
Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_pll/xcvr_pll.qip not found
Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_pll/xcvr_pll.qip
Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_reset_rx/xcvr_reset_rx.qip not found
Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_reset_rx/xcvr_reset_rx.qip
Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_reset_tx/xcvr_reset_tx.qip not found
Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_reset_tx/xcvr_reset_tx.qip
Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_sl2_txrx/xcvr_sl2_txrx.qip not found
Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_sl2_txrx/xcvr_sl2_txrx.qip
Info: Device part name is 10AX115N3F40E2SG
Info: Compiling import revision -> nothing to be done here!
Info: Compiling import revision -> nothing to be done here!
Info: Generating kernel_system.qsys:
Info: qsys-generate -syn --output-directory=kernel_system/ --family="Arria 10" --part=10AX115N3F40E2SG kernel_system.qsys
Error (23031): Evaluation of Tcl script scripts/pre_flow_pr.tcl unsuccessful
Error: Quartus Prime Shell was unsuccessful. 1 error, 6 warnings
Error: Peak virtual memory: 1182 megabytes
Error: Processing ended: Wed Dec 19 13:29:00 2018
Error: Elapsed time: 00:00:17
Error: Total CPU time (on all processors): 00:01:35
Info: *******************************************************************
Info: Running Quartus Prime Compiler Database Interface
Info: Version 16.0.2 Build 222 07/20/2016 SJ Pro Edition
Info: Copyright (C) 1991-2016 Altera Corporation. All rights reserved.
Info: Your use of Altera Corporation's design tools, logic functions
Info: and other software and tools, and its AMPP partner logic
Info: functions, and any output files from any of the foregoing
Info: (including device programming or simulation files), and any
Info: associated documentation or information are expressly subject
Info: to the terms and conditions of the Altera Program License
Info: Subscription Agreement, the Altera Quartus Prime License Agreement,
Info: the Altera MegaCore Function License Agreement, or other
Info: applicable license agreement, including, without limitation,
Info: that your use is for the sole purpose of programming logic
Info: devices manufactured by Altera and sold by Altera or its
Info: authorized distributors. Please refer to the applicable
Info: agreement for further details.
Info: Processing started: Wed Dec 19 13:28:42 2018
Info: Command: quartus_cdb -t import_compile.tcl
Info: Using INI file /mnt/saman/shoc-fpga/src/opencl/PolyBench/bin/mm_BS32/quartus.ini
Error (23031): Evaluation of Tcl script import_compile.tcl unsuccessful
Error: Quartus Prime Compiler Database Interface was unsuccessful. 1 error, 0 warnings
Error: Peak virtual memory: 1348 megabytes
Error: Processing ended: Wed Dec 19 13:29:00 2018
Error: Elapsed time: 00:00:18
Error: Total CPU time (on all processors): 00:01:35Does anyone have any idea what is the issue here? I have compiled so many kernel so far, and haven't faced such issue before. Would appreciate if anyone could give me a hint or something.