Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
10 years ago

Erroneous OpenCl execution on DE1-SOC Kit, hangs on memory transfers

I am trying to use Altera OpenCL compilation on Terasic DE1-SOC board (RevC).

However problems occur when a memory transaction is required.

I couldn't correctly run even the pre-compiled example projects.

Is there anyone manage to successfully run the system?

Thanks in Advance!

----------

Both the versions of Terasic BSP and OPENCL are 14.0 .

I burned the SD-CARD and started FPGA with the right MSELECT configuration.

Changed top.rbf with opencl.rbf for the FAT part of SD card. (I also tried the original version).

HelloWorld => Output is printed from thread 0.

Vector Add => It never reaches to cl_finish. Program hangs at the part that memory buffer is involved.

----------

Outputs:

root@socfpga:~# aocl diagnose

aocl diagnose: Running diagnostic from /home/root/opencl_arm32_rte/board/c5soc/arm32/bin

Verified that the kernel mode driver is installed on the host machine.

Using platform: Altera SDK for OpenCL

Board vendor name: Altera Corporation

Board name: de1soc_sharedonly : Cyclone V SoC Development Kit

Buffer read/write test passed.

DIAGNOSTIC_PASSED

***

root@socfpga:~/vector_Add# ./vectorAdd

Initializing OpenCL

Platform: Altera SDK for OpenCL

Using 1 device(s)

de1soc_sharedonly : Cyclone V SoC Development Kit

Using AOCX: vectorAdd.aocx

Launching for device 0 (1000000 elements)

(It hangs after this point never reaches to end)

***

root@socfpga:~/helloworld# ./helloworld

Querying platform for info:

==========================

CL_PLATFORM_NAME = Altera SDK for OpenCL

CL_PLATFORM_VENDOR = Altera Corporation

CL_PLATFORM_VERSION = OpenCL 1.0 Altera SDK for OpenCL, Version 14.0

Querying device for info:

========================

CL_DEVICE_NAME = de1soc_sharedonly : Cyclone V SoC Development Kit

CL_DEVICE_VENDOR = Altera Corporation

CL_DEVICE_VENDOR_ID = 4466

CL_DEVICE_VERSION = OpenCL 1.0 Altera SDK for OpenCL, Version 14.0

CL_DRIVER_VERSION = 14.0

CL_DEVICE_ADDRESS_BITS = 64

CL_DEVICE_AVAILABLE = true

CL_DEVICE_ENDIAN_LITTLE = true

CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 32768

CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 0

CL_DEVICE_GLOBAL_MEM_SIZE = 536870912

CL_DEVICE_IMAGE_SUPPORT = false

CL_DEVICE_LOCAL_MEM_SIZE = 16384

CL_DEVICE_MAX_CLOCK_FREQUENCY = 1000

CL_DEVICE_MAX_COMPUTE_UNITS = 1

CL_DEVICE_MAX_CONSTANT_ARGS = 8

CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE = 134217728

CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3

CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 8192

CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE = 1024

CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR = 4

CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT = 2

CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT = 1

CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG = 1

CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT = 1

CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE = 0

Command queue out of order? = false

Command queue profiling enabled? = true

Using AOCX: hello_world.aocx

Kernel initialization is complete.

Launching the kernel...

Thread# 0: Hello from Altera's OpenCL Compiler!

Kernel execution is complete.

2 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Execution of boardtest is also given below:

    It hangs at "Kernel-to-Memory Bandwith" test.

    root@socfpga:~/boardtest# ./boardtest

    *****************************************************************

    *********************** Host Speed Test *************************

    *****************************************************************

    clGetDeviceInfo CL_DEVICE_GLOBAL_MEM_SIZE = 536870912

    clGetDeviceInfo CL_DEVICE_MAX_MEM_ALLOC_SIZE = 133169152

    Memory consumed for internal use = 403701760

    Actual maximum buffer size = 133169152 bytes

    Writing 127 MB to global memory ... 113.521374 MB/s

    Reading 133169152 bytes from global memory ... 159.004121 MB/s

    Verifying data ...

    Successfully wrote and readback 127 MB buffer

    Transferring 8192 KBs in 256 32 KB blocks ...

    Transferring 8192 KBs in 128 64 KB blocks ...

    Transferring 8192 KBs in 64 128 KB blocks ...

    Transferring 8192 KBs in 32 256 KB blocks ...

    Transferring 8192 KBs in 16 512 KB blocks ...

    Transferring 8192 KBs in 8 1024 KB blocks ...

    Transferring 8192 KBs in 4 2048 KB blocks ...

    Transferring 8192 KBs in 2 4096 KB blocks ...

    Transferring 8192 KBs in 1 8192 KB blocks ...

    PCIe Gen2.0 peak speed: 500MB/s/lane

    Block_Size Avg Max Min End-End (MB/s)

    Writing 8192 KBs with block size (in bytes) below:

    32768 102.78 113.51 87.26 82.07

    65536 112.87 115.97 106.35 104.75

    131072 116.02 117.09 113.36 113.09

    262144 117.12 117.67 115.84 115.96

    524288 117.48 117.88 116.83 116.95

    1048576 117.75 117.93 117.68 117.50

    2097152 117.86 117.97 117.79 117.75

    4194304 117.98 118.03 117.94 117.95

    8388608 118.03 118.03 118.03 118.03

    Reading 8192 KBs with block size (in bytes) below:

    32768 133.34 152.66 105.40 104.53

    65536 150.46 155.94 137.10 139.02

    131072 155.71 157.76 147.26 151.76

    262144 157.51 158.36 154.60 156.01

    524288 158.28 158.90 156.69 157.62

    1048576 158.59 159.16 158.14 158.27

    2097152 158.02 158.85 156.97 157.84

    4194304 158.81 158.87 158.74 158.74

    8388608 158.94 158.94 158.94 158.94

    Host write top speed = 118.03 MB/s

    Host read top speed = 159.16 MB/s

    HOST-TO-MEMORY BANDWIDTH = 139 MB/s

    *****************************************************************

    ********************* Host Read Write Test **********************

    *****************************************************************

    --- test_rw with device ptr offset 3

    --- test_rw with device ptr offset 0

    HOST READ-WRITE TEST PASSED!

    *****************************************************************

    ***************** Kernel-to-Memory Bandwidth *****************

    *****************************************************************

    Performing kernel transfers of 32 MBs

    Note: This test assumes each memory bank is no smaller than

    32 MBs and the design was compiled with --sw-dimm-partition

    Launching kernel kclk ...

    Launching kernel mem_stream ...

    (It hangs after this point never reaches to end)