Forum Discussion
Hi Arun,
You can refer to below steps to enable and ECC support in Linux for Agilex5 memory controller.
Agilex 5 uses the IO96B memory controller. Please ensure the following kernel configuration options are enabled:
CONFIG_EDAC_DEBUG=y
CONFIG_EDAC_ALTERA=y
CONFIG_EDAC_ALTERA_IO96B=y
ECC Error Injection:
From the Linux prompt, use the following command to inject an ECC error:
echo C > /sys/kernel/debug/edac/io96b0-ecc/altr_trigger
Note:
C → Inject Correctable Error
U → Inject Uncorrectable Error
This command injects a single-bit error syndrome into the memory controller, which triggers an interrupt to the CPU. The Linux driver then reports the error, and you should see logs like the following:
[ 531.047821] EDAC Altera: io96b0-ecc: SBE: word0:0x00409C00, word1:0x00014F00
[ 531.054873] EDAC DEVICE2: CE: Altera ECC Manager instance: io96b0-ecc0 block: io96b0-ecc0 count: 1 'io96b0-ecc'
Field descriptions:
word1 – Lower 32 bits of the ECC error address
word0 – ECC error information
Please refer to Table 253 in the documentation for details on the ECC Error Buffer Structure:
https://www.intel.com/content/www/us/en/docs/programmable/817467/25-1-1/ecc-error-handling.html
This user guide provides detailed information about the Agilex 5 EMIF IP mailbox interface.
"We are now looking to validate ECC (Error Correction Code) functionality on our custom Agilex 5 System-on-Module (SOM) running Linux. Our objective is to ensure that ECC is correctly enabled and functioning across all relevant memory regions, and that error detection and correction mechanisms are properly integrated at the kernel level. "
I am not entirely sure about your specific test plan, but I’m afraid that your intention to validate ECC functionality across all relevant memory regions using the Linux kernel is not appropriate.
The Linux kernel EDAC (Error Detection and Correction) framework provides a mechanism to validate the error injection and error reporting flow for the IO96B memory controller through the mailbox interface. However, it is not a comprehensive debugging tool for validating different memory regions.
The reason is that when the Linux kernel is running and actively using DDR memory, it is not visible to the user which regions are currently in use and which are free. Attempting to modify memory content that is in use by the kernel could result in a kernel crash.
The correct way to use the Linux kernel EDAC driver is to ensure that the ECC error reporting path is functioning correctly.
That is, by using the EDAC driver to perform an error injection and verifying that the kernel reports the corresponding error.
This confirms that if any ECC error occurs on the memory controller, the CPU will receive an interrupt and the Linux driver will report the error appropriately.
Hope this helps.
- Arun_Prabakatr2 months ago
New Contributor
Hi Nirav,
Thanks for your support. We’re using kernel version 6.12.11 with QPD 25.1, and we were able to find the driver CONFIG_EDAC_ALTERA_IO96B. Could you please let us know the related driver we should use to test and validate?- Nirav_Altera2 months ago
New Contributor
Hi Arun,
As I mentioned in my previous reply, you can use the Linux EDAC driver to ensure that ECC error reporting path is functioning. So whenever there is any single bit or double bit ECC error occurs on DDR, Linux EDAC driver will provide the notification with the error info. - KianHinT_altera1 month ago
Frequent Contributor
Hi Arun,
As there is no further enquiries related to this issue, we will step back and allow the community to assist with any future follow-up questions.
Thank you for engaging with us!
Best regards,
Altera Technical Support