Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
10 years ago

OpenCL memory hang on custom platform

We are porting the OpenCL platform to a custom Cyclone V board. Have successfully compiled the OpenCL framework into an FPGA binary (RBF) and is currently loaded on the system. The CMA modules are built into the Linux kernel and the OpenCL host driver module has been recompiled and loaded into the system. DTS changes from the FPGA design are in the process of being merged into the Linux build.

‘aocl diagnose’ returns successful and simple OpenCL program that don’t involve memory transactions complete but we are experiencing issues with programs that copy memory buffers. Simple OpenCL examples like a vector addition that copy a memory buffer never reach clFinish … they hang. These simple programs will execute on C5SOC platform so we are looking into our design.

Right now we are comparing the C5SOC FPGA design with the custom implementation and looking through Linux host driver code. Was wondering if there was a place in particular that we should look into regarding memory buffer transfer issues in OpenCL. Any insight you can give is greatly appreciated.

Thanks,

Chad Hewitt

root@avid-cyclone5:~# aocl diagnose

aocl diagnose: Running diagnostic from /home/root/opencl_arm32_rte/board/avid_alpha/arm32/bin

Verified that the kernel mode driver is installed on the host machine.

Using platform: Altera SDK for OpenCL

Board vendor name: Altera Corporation

Board name: avid_alpha : Cyclone V SoC Development Kit

Buffer read/write test passed.

diagnostic_passed

root@avid-cyclone5:~#

root@avid-cyclone5:~# ./hello_world

Compiled by Randy - 7/22/2015 10:00 AM

Querying platform for info:

==========================

CL_PLATFORM_NAME = Altera SDK for OpenCL

CL_PLATFORM_VENDOR = Altera Corporation

CL_PLATFORM_VERSION = OpenCL 1.0 Altera SDK for OpenCL, Version 15.0

Querying device for info:

========================

CL_DEVICE_NAME = avid_alpha : Cyclone V SoC Development Kit

CL_DEVICE_VENDOR = Altera Corporation

CL_DEVICE_VENDOR_ID = 4466

CL_DEVICE_VERSION = OpenCL 1.0 Altera SDK for OpenCL, Version 15.0

CL_DRIVER_VERSION = 15.0

CL_DEVICE_ADDRESS_BITS = 64

CL_DEVICE_AVAILABLE = true

CL_DEVICE_ENDIAN_LITTLE = true

CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 32768

CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 0

CL_DEVICE_GLOBAL_MEM_SIZE = 536870912

CL_DEVICE_IMAGE_SUPPORT = false

CL_DEVICE_LOCAL_MEM_SIZE = 16384

CL_DEVICE_MAX_CLOCK_FREQUENCY = 1000

CL_DEVICE_MAX_COMPUTE_UNITS = 1

CL_DEVICE_MAX_CONSTANT_ARGS = 8

CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE = 134217728

CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3

CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 8192

CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE = 1024

CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR = 4

CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT = 2

CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT = 1

CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG = 1

CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT = 1

CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE = 0

Command queue out of order? = false

Command queue profiling enabled? = true

Using AOCX: hello_world.aocx

Reprogramming device with handle 1

Kernel initialization is complete.

Launching the kernel...

Thread# 0: Hello from Altera's OpenCL Compiler!

Compiled by Randy - 7/22/2015 10:00 AM

kernel execution is complete.

root@avid-cyclone5:~# ./vector_add

Compiled by Randy - 7/30/2015 10:00 AM

Initializing OpenCL

Platform: Altera SDK for OpenCL

Using 1 device(s)

avid_alpha : Cyclone V SoC Development Kit

Using AOCX: vector_add.aocx

Reprogramming device with handle 1

Launching for device 0 (1000000 elements)

waiting for the queue. (never completes)

6 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi Chad, did you ever figure out your problem? I'm seeing similar behavior in one of our custom boards

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    For me it was a matter of taking top.rbf of opencl compile and loading that in uboot. If we loaded a different top.rbf file the mismatch took down the fpga2sdram bridge. That was root cause and source of failure. Hope that helps.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    what is the kernel version that you are using ? did you try down grade version to ensure nothing to do with the kernel version?

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    what is the kernel version that you are using ? did you try down grade version to ensure nothing to do with the kernel version?

    --- Quote End ---

    Linux kernel version? 3.13.0. Quartus Version? 14.1

    I'm not sure what you're asking here.

    We are testing our board using the diagnostic tool provided with the reference board in Quartus 14.1. Unmodified, this tool writes to all available memory and then checks the result. To facilitate debugging, I've modified it so that it only writes to the first 2048 bytes, and then reads the result back.

    What I'm seeing in signal tap, is that all of the data is being written into memory, but the IRQ is never being raised by the DMA engine to signal that the Host->FPGA transfer completed. This hangs the application. If we scope the provided design on a DE5Net board, the IRQ is raised.

    So the real question for us is: Why isn't the DMA engine raising the IRQ? What sort of issues would cause this behavior?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi ,

    I am much intrested to understand how did you reduce the size of the buffer to 2048 ?

    Thanks,

    Rnivartx
  • WKnat's avatar
    WKnat
    Icon for New Contributor rankNew Contributor

    Hi, I have got the same issue in linux kernel 4.1.22 or newer. In new linux kernel , the request_irq (PIO_IRQ, aclsoc_irq, irq_type, DRIVER_NAME, (void*)aclsoc) function in aoclsoc driver can't get the hardware irq No.72. There is a solution that is porting the aoclsoc driver to platform driver and getting the hardware irq number form device tree. Here is the new cyclone soc opencl rte for linux 4.9.78.