Forum Discussion

SBioo's avatar
SBioo
Icon for Occasional Contributor rankOccasional Contributor
7 years ago

Kernel compilation failure for OpenCL FPGA.

I have this OpenCL code that I'm compiling using OpenCL compiler 16.0-pro. Here is the code:

//
// (c) December 19, 2018 Saman Biookaghazadeh @ Arizona State University
//
 
#ifdef INT_PRECISION
#define DTYPE int
#elif SINGLE_PRECISION
#define DTYPE float
#elif DOUBLE_PRECISION
#pragma OPENCL EXTENSION cl_khr_fp64: enable
#define DTYPE double
#endif
 
#include "funcs.h"
 
channel DTYPE c0;
 
__kernel void mm_k1 (__global const DTYPE* restrict A,
                   __global const DTYPE* restrict B,
                   __global const DTYPE* restrict C,
                   __global DTYPE* restrict D,
                   const DTYPE alpha,
                   const DTYPE beta
#ifdef FPGA_SINGLE
                   ,const int lll)
#else
                   )
#endif
{
 
#ifdef GPU
 
#endif
 
#ifdef FPGA_SINGLE
 
       for (int i = 0; i < lll; i++) {
           for (int j = 0; j < lll; j++) {
               DTYPE temp = 0.0;
               for (int z = 0; z < lll/BLOCK_SIZE; z++) {
                    DTYPE A_local[BLOCK_SIZE];
                    DTYPE B_local[BLOCK_SIZE];
                    DTYPE local_temp = 0.0;
 
                    // Coalescing memory read from the memory section "A"
                    #pragma unroll
                    for (int k = 0; k < BLOCK_SIZE; k++) {
                        A_local[k] = A[i*lll+z*BLOCK_SIZE+k];
                    }
 
                    // Coalescing memory read from the memory section "B"
                    #pragma unroll
                    for (int k = 0; k < BLOCK_SIZE; k++) {
                        B_local[k] = B[i*lll+z*BLOCK_SIZE+k];
                    }
 
                    // Accumulating the result of multiplications
                    #pragma unroll
                    for (int k = 0; k < BLOCK_SIZE; k++) {
                        local_temp += A[k] * B[k] * alpha;
                    }
 
                    // final accumulation
                    temp += local_temp;
               }
               write_channel_altera (c0, temp);
           }
       }
 
#endif
 
}
 
__kernel void mm_k2 (__global const DTYPE* restrict A,
                      __global const DTYPE* restrict B,
                      __global const DTYPE* restrict C,
                      __global DTYPE* restrict D,
                      const DTYPE alpha,
                      const DTYPE beta
#ifdef FPGA_SINGLE
                      ,const int lll)
#else
                      )
#endif
{
 
#ifdef GPU
 
#endif
 
#ifdef FPGA_SINGLE
       for (int i = 0; i < lll; i++) {
           for (int j = 0; j < lll; j++) {
               DTYPE temp = read_channel_altera(c0);
 
               #pragma ivdep
               for (int z = 0; z < lll/BLOCK_SIZE; z++) {
                    DTYPE C_local[BLOCK_SIZE];
                    DTYPE D_local[BLOCK_SIZE];
 
                    // Coalescing memory read from the memory section "A"
                    #pragma unroll
                    for (int k = 0; k < BLOCK_SIZE; k++) {
                        C_local[k] = C[j*lll+z*BLOCK_SIZE+k];
                    }
 
                    // Initializing the memory section "D"
                    #pragma unroll
                    for (int k = 0; k < BLOCK_SIZE; k++) {
                        D_local[k] = 0.0;
                    }
 
                    // Accumulating the result of multiplications
                    #pragma unroll
                    for (int k = 0; k < BLOCK_SIZE; k++) {
                        D_local[k] += temp * C_local[k];
                    }
 
                    // final accumulation
                    #pragma unroll
                    for (int k = 0; k < BLOCK_SIZE; k++) {
                        D[j*lll+z*BLOCK_SIZE+k] += D_local[k];
                    }
 
               }
 
           }
       }
#endif
 
}

I compile the code using the script:

aoc -W -v -DSINGLE_PRECISION=true -DFPGA_SINGLE=true -DBLOCK_SIZE=32 --report --board p385a_sch_ax115 2mm.cl -o bin/mm_BS32.aocx

For some reason, the code fails for the compilation. There is no specific error for the compilation failure. I have added the quartus_sh_compile.log log:

Info: *******************************************************************
Info: Running Quartus Prime Shell
    Info: Version 16.0.2 Build 222 07/20/2016 SJ Pro Edition
    Info: Copyright (C) 1991-2016 Altera Corporation. All rights reserved.
    Info: Your use of Altera Corporation's design tools, logic functions
    Info: and other software and tools, and its AMPP partner logic
    Info: functions, and any output files from any of the foregoing
    Info: (including device programming or simulation files), and any
    Info: associated documentation or information are expressly subject
    Info: to the terms and conditions of the Altera Program License
    Info: Subscription Agreement, the Altera Quartus Prime License Agreement,
    Info: the Altera MegaCore Function License Agreement, or other
    Info: applicable license agreement, including, without limitation,
    Info: that your use is for the sole purpose of programming logic
    Info: devices manufactured by Altera and sold by Altera or its
    Info: authorized distributors.  Please refer to the applicable
    Info: agreement for further details.
    Info: Processing started: Wed Dec 19 13:28:43 2018
Info: Command: quartus_sh -t scripts/pre_flow_pr.tcl compile top top
Info: Quartus(args): compile top top
Info: Using INI file /mnt/saman/shoc-fpga/src/opencl/PolyBench/bin/mm_BS32/quartus.ini
Info: Running pre-flow script
Info: Project name: top
Info: Revision name: top
Info: Checking for OpenCL SDK installation, environment should have ALTERAOCLSDKROOT defined
Info: ALTERAOCLSDKROOT=/home/users/saman/16.0-pro/hld
Info: Compiling import revision -> nothing to be done here!
Warning (125092): Tcl Script File board/board.qip not found
    Info (125063): set_global_assignment -name QIP_FILE board/board.qip
Warning (125092): Tcl Script File kernel_system/kernel_system.qip not found
    Info (125063): set_global_assignment -name QIP_FILE kernel_system/kernel_system.qip
Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_pll/xcvr_pll.qip not found
    Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_pll/xcvr_pll.qip
Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_reset_rx/xcvr_reset_rx.qip not found
    Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_reset_rx/xcvr_reset_rx.qip
Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_reset_tx/xcvr_reset_tx.qip not found
    Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_reset_tx/xcvr_reset_tx.qip
Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_sl2_txrx/xcvr_sl2_txrx.qip not found
    Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_sl2_txrx/xcvr_sl2_txrx.qip
Info: Device part name is 10AX115N3F40E2SG
Info: Compiling import revision -> nothing to be done here!
Info: Compiling import revision -> nothing to be done here!
Info: Generating kernel_system.qsys:
Info:     qsys-generate -syn --output-directory=kernel_system/ --family="Arria 10" --part=10AX115N3F40E2SG kernel_system.qsys
Error (23031): Evaluation of Tcl script scripts/pre_flow_pr.tcl unsuccessful
Error: Quartus Prime Shell was unsuccessful. 1 error, 6 warnings
    Error: Peak virtual memory: 1182 megabytes
    Error: Processing ended: Wed Dec 19 13:29:00 2018
    Error: Elapsed time: 00:00:17
    Error: Total CPU time (on all processors): 00:01:35
Info: *******************************************************************
Info: Running Quartus Prime Compiler Database Interface
    Info: Version 16.0.2 Build 222 07/20/2016 SJ Pro Edition
    Info: Copyright (C) 1991-2016 Altera Corporation. All rights reserved.
    Info: Your use of Altera Corporation's design tools, logic functions
    Info: and other software and tools, and its AMPP partner logic
    Info: functions, and any output files from any of the foregoing
    Info: (including device programming or simulation files), and any
    Info: associated documentation or information are expressly subject
    Info: to the terms and conditions of the Altera Program License
    Info: Subscription Agreement, the Altera Quartus Prime License Agreement,
    Info: the Altera MegaCore Function License Agreement, or other
    Info: applicable license agreement, including, without limitation,
    Info: that your use is for the sole purpose of programming logic
    Info: devices manufactured by Altera and sold by Altera or its
    Info: authorized distributors.  Please refer to the applicable
    Info: agreement for further details.
    Info: Processing started: Wed Dec 19 13:28:42 2018
Info: Command: quartus_cdb -t import_compile.tcl
Info: Using INI file /mnt/saman/shoc-fpga/src/opencl/PolyBench/bin/mm_BS32/quartus.ini
Error (23031): Evaluation of Tcl script import_compile.tcl unsuccessful
Error: Quartus Prime Compiler Database Interface was unsuccessful. 1 error, 0 warnings
    Error: Peak virtual memory: 1348 megabytes
    Error: Processing ended: Wed Dec 19 13:29:00 2018
    Error: Elapsed time: 00:00:18
    Error: Total CPU time (on all processors): 00:01:35

Does anyone have any idea what is the issue here? I have compiled so many kernel so far, and haven't faced such issue before. Would appreciate if anyone could give me a hint or something.

3 Replies

  • HRZ's avatar
    HRZ
    Icon for Frequent Contributor rankFrequent Contributor

    I don't have 16.0.2 installed but I did a quick test with 16.1.2 on the same Nallatech board and it seems to work fine. I didn't wait until place and route finishes but the process indeed progressed further than it does for you. Since the error is happening after OpenCL to HDL conversion, this is likely not a problem related to your kernel. Maybe there are some files remaining from a previous compile that is interfering with the new one; have you tried fully removing the OpenCL project folder and retrying? This could also be some bug in some Quartus component that has been fixed in 16.1.0/16.1.2. I would strongly recommend using at least 16.1.2 on Arria 10 since 16.0.x performs terribly when it comes to packing math operations into the DSPs on Arria 10.

  • Hi,

    What is the output on console any Error hint from there ?

    Also how much is the available RAM for the system ?

    Thanks,

    Arslan

  • SBioo's avatar
    SBioo
    Icon for Occasional Contributor rankOccasional Contributor

    Thanks much for the reply. I actually removed the whole OpenCL project folder and still behaves the same. Maybe it's a bug, since I have never had such an issue before. The only difference is, this time I have more nested loops than before.

    As you said, I will try and see if I can upgrade to 16.1.x version.

    Thanks