ContributionsMost RecentMost LikesSolutionsRe: CRAM SEU detection and mitigation support with OpenCL/oneAPI Thank you for replying @BoonBengT_Altera , I have just returned from vacation. This issue has not been resolved. The documentation you linked does not cover what I asked for. I'm aware of the CRC_ERROR pin, as mentioned in the post above. I've confirmed the N520 board has hardware reading this pin, the only part left is intel's host code. I'm asking specifically about a function in intel's host code library for HLS development, noted below: clSetDeviceExceptionCallbackIntelFPGA( cl_uint num_devices, const cl_device_id * devices, CL_EXCEPTION_TYPE_INTEL listen_mask, void (CL_CALLBACK * pfn_exception_notify)( CL_EXCEPTION_TYPE_INTEL exception_type, const void * private_info, size_t cb, void * user_data), void * user_data); This function can be found in the header file at the following path: .../intelFPGA_pro/21.4.0/hld/host/include/CL/cl_ext_intelfpga.h I am looking for documentation on this specific function. Searching on google or intels own search function yields no results. I have not been able to find it mentioned anywhere in the documentation either. The question is: What arguments must I pass to this function to catch all CRC exceptions? Kind regards, Lennart Re: CRAM SEU detection and mitigation support with OpenCL/oneAPI Yeah I do think M20K + CRAM together should cover most of them. The one thing I'm looking for is documentation on this function: clSetDeviceExceptionCallbackIntelFPGA( cl_uint num_devices, const cl_device_id * devices, CL_EXCEPTION_TYPE_INTEL listen_mask, void (CL_CALLBACK * pfn_exception_notify)( CL_EXCEPTION_TYPE_INTEL exception_type, const void * private_info, size_t cb, void * user_data), void * user_data); Specifically if the listen mask is inclusive or exclusive. Which inputs allow me to receive ALL error signals? I've not found any documentation on it. Could you perhaps get me in touch with one of the developers of the Intel OpenCL library? Re: CRAM SEU detection and mitigation support with OpenCL/oneAPI Hi again BoonBengT, Thank you for responding. My apologies, I should have specified my hardware. I'm working on the Intel Stratix 10 GX 2800 on a Bittware N520 board. There are two types of errors I think are the most prevalent/important: (If I'm leaving out an important class please let me know) - M20K errors, These are handled through built-in ECC in the M20K blocks. I handle these myself. - Configuration RAM errors, aka errors in the FPGA fabric configuration. These are the ones I wish to detect. Intel documentation indicates that there should be dedicated hardware for automatic scanning. This hardware then drives the CRC_ERROR pin as I understand it. I've looked in bittware's documentation, and confirmed that this BSP does handle the CRC_ERROR pin. The generated aoc quartus project also has the periodic integrity checking flag enabled. I couldn't find it mentioned anywhere in the documentation though. What the documentation does mention is the -ecc flag (https://www.intel.com/content/www/us/en/docs/programmable/683846/21-4/compiling-your-kernel-with-memory-error.html) , which adds M20K error detection in OpenCL code, but does not mention CRAM error detection. I was able to find the above function in the intel header files at: .../intelFPGA_pro/21.4.0/hld/host/include/CL/cl_ext_intelfpga.h I think this is called when the CRC_ERROR pin is signaled. But I don't exactly know how this callback behaves. You can see that testing this function is not exactly trivial. (I don't think they'll let me into the data center with a gamma-ray cannon 😛 ) If you have any documentation or info on this, then please let me know. I can't afford even a single undetected error in this project. Kind regards, Lennart Re: CRAM SEU detection and mitigation support with OpenCL/oneAPI So searching through the Intel OpenCL headers I came across this nice function which seems to suggest ECC CRAM SEU detection is built into the BSP. If that is the case then that would be amazing. clSetDeviceExceptionCallbackIntelFPGA( cl_uint num_devices, const cl_device_id * devices, CL_EXCEPTION_TYPE_INTEL listen_mask, void (CL_CALLBACK * pfn_exception_notify)( CL_EXCEPTION_TYPE_INTEL exception_type, const void * private_info, size_t cb, void * user_data), void * user_data); I cannot find anything in the docs about this function, so perhaps you can help me. A. Does this detect M20K eccs AND/OR CRAM ecc? I believe M20Ks in the OpenCL kernel are not covered as they are configured in Quad-Port mode. Mostly I'm interested in CRAM ecc detection. B. Is the listen_mask inclusive or exclusive? Kind of important to know, and impossible to test. What should I pass to make sure I capture any and all detected CRAM ECC faults? Kind regards, Lennart CRAM SEU detection and mitigation support with OpenCL/oneAPI Hi there, I'm doing a large and long-running FPGA supercomputing project, with an average of 20 FPGAs running in parallel over the course of 5 months. Our project is very sensitive to cosmic ray interference, so we want to take all possible steps to ensure our result is correct. We've already made sure to collect errors from our M20K blocks, and wish to ensure the same for the CRAM. Is CRAM error detection/scrubbing built into OpenCL and or oneAPI? Kind regards, Lennart Re: Stratix 10 M20K Clock Clock Enable for ECC Pipeline register In the end I was able to test that yes indeed, applying a clock enable to the output registers does add a clock enable to the pipeline registers as well. As for the other questions, they're still a mystery, but at least we can continue development with this. Re: Stratix 10 M20K Clock Clock Enable for ECC Pipeline register Hi SyafieqS, Thank you for your time. This is still an open question for us. As our build system is broken right now I can't test it myself, so having confirmation on how the M20K output registers work would really help. Kind regards, Lennart Stratix 10 M20K Clock Clock Enable for ECC Pipeline register Hi there, In the M20K documentation on ECC, there is mention of an optional pipeline register within the ECC pipeline for achieving higher FMax. It is not entirely clear if adding a clock enable to the output register also adds a clock enable to this internal register. https://www.intel.com/content/www/us/en/docs/programmable/683461/current/error-correction-code-truth-table.html https://www.intel.com/content/www/us/en/docs/programmable/683240/17-0/ram-and-rom-parameter-settings.html What I want to get is a M20K block where the read addr, ECC pipeline register and output register all have a clock enable. So that reading takes 3 clock enables to complete. Does adding a clock enable to "all output registers" add a clock enable to the internal ECC pipeline register? Another minor question, why is it seemingly impossible to add a clock enable to the M20K output register in single-clock dual-port mode? I can only either add a clock enable to all ports, or to none of them, but in dual-clock mode this is not a problem? One final thing, what does clock_enable_core_b: "The clock enable for the core of port B." this mean? The 'core' of port A/B? Kind regards, Lennart SolvedRe: SDC constraints with native module for OpenCL Thank you, @BoonBengT_Altera We've been able to solve the issue in a different way, where we transmit the constant over two wires to shift register receivers all across the FPGA. That saves us a lot of routing resources as well. In any case thank you for the linked document, if the need comes up again we'll definitely consult it. Kind regards, Lennart Re: SDC constraints with native module for OpenCL Thanks for replying, The issue is that working with OpenCL, I cannot edit the generated Quartus Project itself. It is generated by aoc in the process of an OpenCL compile, from an xml file that describes the module. In that xml I must specify the files that are needed for my module: verilog files, vhd files, memory files, etc. And then when it compiles it just copies the necessary files to a fresh Quartus Project and executes the necessary tcl scripts for OpenCL compilation. The thing is that it's really picky about the accepted file types, accepting only these files, and not for example .qip files or .sdc files. I was just looking to see if there's a workaround I can use, even perhaps going as far as using filesystem events to detect and edit the generated top.sdc before quartus reads it. Kind regards, Lennart