Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
9 years ago

astonishing facts about aocl compiler : vector_add -> No DSP used, y[i]=x[i] comp er

There are some facts about the AOCL compiler i find astonishing.

aocl version 16.0.2.222, Quartus the same, 10AX115N3F40E2SG on Nallatech board pci385a and bsp.

1) the result of the compilation of the vector_add example is that no DSP is utilized and in fact

performance is terrible

; Resource + Usage ;

+----------------------------------------+---------------------------+

; Logic utilization ; 10% ;

; ALUTs ; 4% ;

; Dedicated logic registers ; 6% ;

; Memory blocks ; 10% ;

; DSP blocks ; 0% ;

why that ?

2) the compilation of this simple vector copy fails with compiler error :

---------------------- riinout.cl ---------------------------------------

__kernel void riinout( __global const float *x,

__global float *restrict y)

{

// get index of the work item

int index = get_global_id(0);

y[index] = x[index];

}

---------------------------------------------------------------------------

$ aoc device/riinout.cl -o bin/riinout.aocx

/media/sda1/home/nallatech/aocl/examples/riinout/riinout/device/riinout.cl:3:46: warning: declaring kernel argument with no 'restrict' may lead to low kernel performance

__kernel void riinout( __global const float *x,

^

1 warning generated.

Error: Compiler Error, not able to generate hardware

---------------------------- quartus_sh_compile.log---------------------------

Info: Initializing Spectra-Q Synthesis...

Info: Project = "top"

Info: Revision = "top_synth"

Warning (125092): Tcl Script File board/board.qip not found

Info (125063): set_global_assignment -name QIP_FILE board/board.qip

Info: qis_default_flow_script.tcl version:# 1

Info: Initializing Spectra-Q Synthesis...

Info: Project = "top"

Info: Revision = "top_synth"

Info (16303): High Performance Effort optimization mode selected -- timing performance will be prioritized at the potential cost of

increased compilation time

Info (16303): High Performance Effort optimization mode selected -- timing performance will be prioritized at the potential cost of

increased compilation time

*** Fatal Error: Segment Violation at (nil)

Module: quartus_syn

Stack Trace:

0x60a43: google::protobuf::FileDescriptorTables::~FileDescriptorTables() + 0x33 (protobuf.so.8)

0x35e7d: __cxa_finalize + 0x9d (c.so.6)

0x35ae2: exit + 0xe2 (c.so.6)

0x1ed24: __libc_start_main + 0x104 (c.so.6)

End-trace

Error (114016): Out of memory in module quartus_syn (1671 megabytes used)

*** Fatal Error: Segment Violation at (nil)

Module: quartus_syn

Stack Trace:

0x60a43: google::protobuf::FileDescriptorTables::~FileDescriptorTables() + 0x33 (protobuf.so.8)

0x35e7d: __cxa_finalize + 0x9d (c.so.6)

0x35ae2: exit + 0xe2 (c.so.6)

0x1ed24: __libc_start_main + 0x104 (c.so.6)

End-trace

Error (114016): Out of memory in module quartus_syn (1677 megabytes used)

*** Fatal Error: Segment Violation at (nil)

Module: quartus_syn

Stack Trace:

0x60a43: google::protobuf::FileDescriptorTables::~FileDescriptorTables() + 0x33 (protobuf.so.8)

0x35e7d: __cxa_finalize + 0x9d (c.so.6)

0x35ae2: exit + 0xe2 (c.so.6)

0x1ed24: __libc_start_main + 0x104 (c.so.6)

End-trace

Error (114016): Out of memory in module quartus_syn (1688 megabytes used)

Error: Failed to synthesize partition

Info: Saving post-synthesis snapshots for 1 partition(s)

*** Fatal Error: Segment Violation at (nil)

Module: quartus_syn

Stack Trace:

0x60a43: google::protobuf::FileDescriptorTables::~FileDescriptorTables() + 0x33 (protobuf.so.8)

0x35e7d: __cxa_finalize + 0x9d (c.so.6)

0x35ae2: exit + 0xe2 (c.so.6)

0x1ed24: __libc_start_main + 0x104 (c.so.6)

End-trace

--------------------------------------------------------------------------------------

Any answer ?

thanks

roberto

6 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    1) Only a tiny percentage of the chip is being used here. And the percentages shown are integer-ized, so you may need to use a dozen DSPs to change the 0% to 1%. But if you want to scale up the size and performance try vectorizing or num compute units.

    2) The aoc compiler uses Quartus under the hood, Quartus needs a lot of memory, 64GB is a good size. The error message you're getting (below) is definitely pointing to a system with too little memory.

    Error (114016): Out of memory in module quartus_syn (1688 megabytes used)
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi,

    the Nallatech 385A BSP version r001.004.0001 is strictly limited to OpenCL SDK / Quartus Prime Pro version 16.0.0 (no updates).

    I would suggest installing 16.0.0 and trying to go through the compilation process again.

    Thanks,

    G
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I went back to quartus 16.0.0 as suggested.

    I still had the cant compile from time to time.

    My impression is that it gets corrupted .. dont know what.

    Because if i reboot the host then the compiler is working again.

    thanks
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The Nallatech 385A BSP version r001.004.0002 for OpenCL SDK / Quartus Prime pro version 16.0.2 is now released.

    There are 2 versions of the Nallatech 385A BSP:

    1. HPC BSP (2 x 40Gbps board-to-board IO channels) and

    2. MAC BSP (2 x 10GbE MAC cores IO channels)

    Thanks

    G
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    The Nallatech 385A BSP version r001.004.0002 for OpenCL SDK / Quartus Prime pro version 16.0.2 is now released.

    There are 2 versions of the Nallatech 385A BSP:

    1. HPC BSP (2 x 40Gbps board-to-board IO channels) and

    2. MAC BSP (2 x 10GbE MAC cores IO channels)

    Thanks

    G

    --- Quote End ---

    Thanks for the info.

    I think i will try it.

    Even if I'm a bit scared because the last time passing from the beta to the official release

    Nallatech had the idea of changing the name of the board making unusable

    binaries that required hundreds of hours of cpu time to build them up.

    grazie
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    Thanks for the info.

    I think i will try it.

    Even if I'm a bit scared because the last time passing from the beta to the official release

    Nallatech had the idea of changing the name of the board making unusable

    binaries that required hundreds of hours of cpu time to build them up.

    grazie

    --- Quote End ---

    Now I'm scared to update definitely BSP.

    One first result I got from a precompiled fft1d binary is that it is 30% less performant :

    with new BSP R001.004.0002

    Using AOCX: fft1d.aocx

    Reprogramming device with handle 1

    Launching FFT transform for 2000 iterations

    FFT kernel initialization is complete.

    Processing time = 5.7592ms

    Throughput = 1.4224 Gpoints / sec (85.3449 Gflops)

    Signal to noise ratio on output sample: 137.677661 --> PASSED

    Launching inverse FFT transform for 2000 iterations

    Inverse FFT kernel initialization is complete.

    Processing time = 5.7347ms

    Throughput = 1.4285 Gpoints / sec (85.7101 Gflops)

    Signal to noise ratio on output sample: 137.041007 --> PASSED

    with old BSP (quartus 16.0.0)

    Using AOCX: fft1d.aocx

    Launching FFT transform for 2000 iterations

    FFT kernel initialization is complete.

    Processing time = 4.1108ms

    Throughput = 1.9928 Gpoints / sec (119.5692 Gflops)

    Signal to noise ratio on output sample: 137.677661 --> PASSED

    Launching inverse FFT transform for 2000 iterations

    Inverse FFT kernel initialization is complete.

    Processing time = 4.0927ms

    Throughput = 2.0016 Gpoints / sec (120.0977 Gflops)

    Signal to noise ratio on output sample: 137.041007 --> PASSED