Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
10 years ago

De1-SOC SharedOnly Board Package Incompatible with Altera Sample Code

Hello All,

I'm a few months into my learning of OpenCL/non-verilog FPGA development and am running across an issue while trying to compile code.

Some background specs are as follows:

Quartus 15.0 with OpenCL License from Altera

NIOS 2, EDS, and Altera OpenCL SDK all 15.0 from Altera

Computer to Compile on:

Windows 7 Ultimate, 16GB RAM, Intel Core i7-4900MQ @ 2.80GHz

FPGA Board:

Altera/Terasic DE1-SOC Education board

de1soc_sharedonly <- Board package used for OCL compiling

Simple Statement of Problem:

Attempting to compile Altera Sample code such as "Finite Difference 3D" and "Matrix Multiplication" both give the error "Cannot fit kernel(s) on device" even with "--high-effort" parameter.

Detailed Breakdown:

I can compile and run precompiled solutions by Altera entitled "hello_world" and "vector_add" which come with the BSP for the DE1-SOC. I can also compile and run the boardtest that comes with it as well. I have verified that the Quartus/NIOS2/EDS/OCL SDK installations are all correctly functioning, and that my license with Altera is current and active. The only BSP I have installed is "de1soc_sharedonly" and I briefly recall from my many nights of trying to get it all put together that there was a separate "de1soc" BSP that wasn't shared only. I don't fully recall the difference, and I can't find the documentation that had that in there, but maybe that has something to do with this?

I have attempted multiple different Altera OpenCL sample codes from section 3 of this page and none have worked for me as of yet. My process to compile them is as follows:

Download and Extract Linux version from Website and put into examples folder under C:/altera/15.0/hld/board/terasic/de1soc/examples

Open Altera Embedded Command Shell 15.0 by locating the batch file in C:/altera/15.0/embedded

Open the makefile associated with the project and correctly set the compiler to arm-linux-gnueabihf-g++.

Add --arm on to the end of the following lines

AOCL_COMPILE_CONFIG := $(shell aocl compile-config --arm)
AOCL_LINK_CONFIG := $(shell aocl link-config --arm)

This allows me to use the windows environment to compile linux code for the arm processor using the provided cross-compiler.

The make runs successfully, giving no errors and generating the correct files.

Then run the aoc device/*.cl -o bin/*.aocx (giving parameters) and allowing computer to compile for several hours.

If I get the error "Cannot fit Kernel(s) on device" I retry my compile with --high-effort parameter

Still get error that I cannot fit kernels on device.

I have considered modifying the code myself to use a smaller number of kernels, but it strikes me as odd that I cannot even run guaranteed working sample code with my setup. I would rather fix the issue with my setup than accept less-than-possible performance of my FPGA.

Any help would be greatly appreciated!

~Chris

3 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    It may very well be that the design is too big. I know for a stratix V, the matrix multiplication is ~70% utilization without modifying the kernel. For a cyclone V soc, which i think is the FPGA on that board, is much much smaller. One thing thing you can do is look at the area.rpt in your compile directory to see the utilization of the design (or compile with --report) to see if the resource utlization is over 100%.

    Otherwise, try to decrease the size of the matrix multiplication by decreasing the block size in the header file. This should result in a much smaller design. Instead of 64, try 16 or smaller.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    It may very well be that the design is too big. I know for a stratix V, the matrix multiplication is ~70% utilization without modifying the kernel. For a cyclone V soc, which i think is the FPGA on that board, is much much smaller. One thing thing you can do is look at the area.rpt in your compile directory to see the utilization of the design (or compile with --report) to see if the resource utlization is over 100%.

    Otherwise, try to decrease the size of the matrix multiplication by decreasing the block size in the header file. This should result in a much smaller design. Instead of 64, try 16 or smaller.

    --- Quote End ---

    Thanks for the response! I'll try compiling with --report now, it just bothered me that I can't even run sample code. However, knowing both that Altera has many different OCL boards and that they are probably trying to push the limits of them and others, I shouldn't be too upset that it's such a big design it can't fit on my board.

    I was just thinking that I was doing something wrong rather than it being a limitation of the hardware.

    Now to figure out how to compile SDL for arm...
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    No problem. Yea, their initial setup is pretty big where they handle 64x64 blocks of the matrix at a time. On top of that, they're doing floating point mutliplication operations which is expensive. They unroll the section where they do multiply accumulate so that's 64 floating point macc operations implemented in hardware. A simple edit to the host/inc/matrixMult.h file and change the BLOCK_SIZE to maybe 4 or 8 should do the trick.