ContributionsMost RecentMost LikesSolutionsRe: Agilex 5 HPS TEE May I know, what kind of use-cases you are intending to use the OP-TEE OS BL32 for? This will give us some inputs/feedbacks on our roadmap. Thanks Re: Routing PL DDR to PS The link you posted is the ideal way to enable the external memory to HPS using the hard memory controller. However, you asked if the HPS could use the other external memory connected to the FPGA fabric. That is technically possible but not the recommended way. Hope this makes sense. Re: To evaluate and monitor CPU frequency behavior in the Kernel OS JamesG_Altera is 100% correct. Please follow James's suggestion on the UBOOT SSBL to modify the register (ping-pong counter/divider) or Quartus. This is the recommended solution. For the Linux part, I will just add a bit more information. We do not support setting the Clock rate through the clock manager driver in Linux yet. But customers are free to implement it if they want as this is open source. Hence, contributing to the open source community. If you are keen to try, then, i created some basic steps on how to do this in Linux below. Step 1: We need to define the Min, and max frequency or multiple operating frequencies for the CPU cores using the opp-table. I am just giving an example with some sample frequencies. We do not need to specify the operating voltage because the Agilex5 does not support it. We only support frequency scaling. As shown below, you have to add the DT bindings to the clock manager and define the device tree for opp-table. This is not the correct values. And it is only as a demostration. cpus { #address-cells = <1>; #size-cells = <0>; cpu0: cpu@0 { compatible = "arm,cortex-a55"; reg = <0x0>; device_type = "cpu"; enable-method = "psci"; next-level-cache = <&L2>; clocks = <&clkmgr AGILEX5_CORE0_FREE_CLK>; clock-names = "cpu"; operating-points-v2 = <&cpu_opp>; // cpufreq uses these }; cpu1: cpu@1 { compatible = "arm,cortex-a55"; reg = <0x100>; device_type = "cpu"; enable-method = "psci"; next-level-cache = <&L2>; clocks = <&clkmgr AGILEX5_CORE1_FREE_CLK>; clock-names = "cpu"; operating-points-v2 = <&cpu_opp>; // cpufreq uses these }; cpu2: cpu@2 { compatible = "arm,cortex-a76"; reg = <0x200>; device_type = "cpu"; enable-method = "psci"; next-level-cache = <&L2>; clocks = <&clkmgr AGILEX5_CORE2_FREE_CLK>; clock-names = "cpu"; operating-points-v2 = <&cpu_opp>; // cpufreq uses these }; cpu3: cpu@3 { compatible = "arm,cortex-a76"; reg = <0x300>; device_type = "cpu"; enable-method = "psci"; next-level-cache = <&L2>; clocks = <&clkmgr AGILEX5_CORE3_FREE_CLK>; clock-names = "cpu"; operating-points-v2 = <&cpu_opp>; // cpufreq uses these }; cpu_opp: opp-table { compatible = "operating-points-v2"; opp-shared; opp-min { opp-hz = /bits/ 64 <10000000>; // 10MHz }; opp-max { opp-hz = /bits/ 64 <800000000>;// Example, 800MHz }; }; Step 2: You have to modify the drivers\clk\socfpga\clk-periph-s10.c to add the 2 ops function for .set_rate and .round_rate. The following codes are just POC and not production quality. The objective is to modify the Ping pong counter. The spec says, that, "Division setting for ping pong counter in clock slice. Divides the core01_clk frequency by this value + 1." The reset value is 0, and the divisor is 1. It will take the maximum frequency of the CPU core. Intel Sundance Mesa HPS Register Map - core01ctr static int clk_cpu_set_rate(struct clk_hw *hw, unsigned long rate, unsigned long parent_rate) { struct socfpga_periph_clk *socfpgaclk = to_periph_clk(hw); unsigned long div = 1; if (socfpgaclk->hw.reg) { /* Calculate divisor */ div = (parent_rate/rate) + 1; // We need to obtain the right divisor based on the max clock frequency. The requested rate is assumed to be always smaller than the parent_rate (maximum core frequency) writel(div, socfpgaclk->hw.reg); pr_err("DEBUG: clk_cpu_set_rate %d parent= %d, divisor = %d\n", rate, parent_rate, div); } return clk_peri_cnt_clk_recalc_rate(hw, parent_rate); } // This is not implemented in full yet. Just for demo. static long clk_cpu_round_rate(struct clk_hw *hw, unsigned long rate, unsigned long *parent_rate) { return rate; } static const struct clk_ops peri_cnt_clk_ops = { .recalc_rate = clk_peri_cnt_clk_recalc_rate, .get_parent = clk_periclk_get_parent, .set_rate = clk_cpu_set_rate, // <- Define the new function here .round_rate = clk_cpu_round_rate // Define the new function here. }; Testing the solution. You should be able to see cpufreq once you have both Device tree and Device driver (clock manager) hacked. You must set the governor to user-space in order for you to modify the cpu frequency from sysFS. root@dhcp0:~# echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor Modify the speed (in kHz) echo 50000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed root@dhcp0:~# ls /sys/devices/system/cpu/cpu0/cpufreq affected_cpus related_cpus scaling_governor cpuinfo_cur_freq scaling_available_frequencies scaling_max_freq cpuinfo_max_freq scaling_available_governors scaling_min_freq cpuinfo_min_freq scaling_cur_freq scaling_setspeed cpuinfo_transition_latency scaling_driver stats Re: Routing PL DDR to PS hi, Are you specifically asking to do the following? HPS CPU (ARM) <--AXI Bridge--> FPGA Interconnect <---> DDR Controller (in FPGA fabric) <---> DDR chips (physically connected to FPGA pins) If so, it could be possible to get the HPS CPU to access the DDR connected to the PL. We need to have some RTL magic/implementation to expose the memory controller interface in the FPGA as a slave memory device and then, configure the HPS to use it as the system memory. There lies many challenges that I can't confirm. We may have coherency issues (IO Cache coherency but the SCU anyways does not support IO Cache coherency from other initiators or DMA agents). And also performance issues like higher memory latencies. Since the memory interface is routed through the HPS to FPGA bridge, HPS running an OS like Linux which requires MMU and Cache enablement may run into coherency issues. UBOOT SPL or bare-metal running on HPS could run without enabling DCache/ICache. Having said that, if it is just the HPS CPU cores which relies on the SCU, this could be possible as the memory coherency between the cores are maintained by SCU and the cache managment may be fine. You will also have to ensure that, no other agents in FPGA or DMA devices writes to the DDR/memory (This could be solvable easily by SW intervention, SW cache coherency management). From a SW standpoint, you have to modify the UBOOT/Linux device tree to use the memory mapped addresses at the HPS to FPGA bridge. The FPGA has to be configured before starting the HPS. *Add-on on SW, the UBOOT SPL SDRAM device driver or module may need to be altered/hacked to bypasss the default drivers which runs the hard memory controller in A10. Else it will throw an error during boot-up if there are any. If no errors, you can ignore this. You need to ensure the PL/FPGA memory controller is up and in user-mode in the FSBL UBOOT SPL before it starts to load the secondary boot images to DDR. Memory map changes required for the arch\arm\boot\dts\intel\socfpga\socfpga_arria10_socdk.dtsi #include "socfpga_arria10.dtsi" / { model = "Altera SOCFPGA Arria 10"; compatible = "altr,socfpga-arria10-socdk", "altr,socfpga-arria10", "altr,socfpga"; aliases { ethernet0 = &gmac0; serial0 = &uart1; }; chosen { bootargs = "earlyprintk"; stdout-path = "serial0:115200n8"; }; memory@0 { name = "memory"; device_type = "memory"; reg = <0xC0000000 0x40000000>; /* 1GB <- Changed from 0x0 to 0xC0000000. The base address of the memory should be based on the FPGA fabric design where the slave memory device to DDR is at. And, you should use the SOC2FPGA bridge interface. */ }; This is the tentative reply to your question. A readily available workaround does not exist today as it may requires some POC or exploration by your team. We will consult more experts and if we have anything extra or corrections, we will reply later to this discussion thread. Thanks Re: Operating system kernel-level FPGA bridge communication hi, By right, the devmem2 must be used to read/write to the HPS2FPGA bridge using the mapping below for Agilex5. I can see that your mapping is incorrect. If you are using the LW HPS2FPGa, it starts at 0x1FC0_0000. You do not need to create any nodes in the device tree. The nodes in the device tree are only required if you are writing/using SOFT IP Device drivers in Linux kernel to access the bridge. Without drivers and with user-space interface, you should be able to use devmem2. The bridges will be enabled by default if you do a FPGA configuration in Linux. If you perform this in U-boot, you will have to do a bridge enable in uboot before booting the Linux. The configs that you use above example, CONFIG_FPGA_BRIDGE and _REGION are only required if you are doing FPGA configuration in Linux as they are required by the FPGA manager device driver "stratix10_soc.ko" driver to configure the 2nd stage FPGA core.rbf. In short, the bridges should be enabled if you have configured the FPGA in UBOOT prior to loading Linux. Re: Agilex 5 EMAC to EMAC : Driver error Thanks, we will have to analyze the SOF you provided to further troubleshoot this problem. We will revert to you when we have more insights on the issue. thanks again! Re: Agilex 5 EMAC to EMAC : Driver error I would be questioning why emac1 and emac0 clk are disabled for some unknown reason. From the CLK dump from Linux, I believed the Quartus design may have an issue. These 2 clock gates the GMAC0 and GMAC1 (emac0_en , emac1_en). That is why GMAC2 is the only one working. We don't know why yet. Thanks for all the dumps. Can you send us the file below? We would need to inspect your SOF file. XXX.sof Re: Agilex 5 EMAC to EMAC : Driver error On top of what my other colleagues asked, Can you also dump, the boot Linux log for all the lines that mentions "socfpga-dwmac"? I want to understand if the Hard IP is even accessible or enabled correctly by the Quartus design for Ethernet. The example i have here is for gmac2. But you are using gmac0 and gmac1. It seems you are unable to even perform the initial setup of the ETH HW which has a DMA engine. This requires write to the ETH HW CSR register address. I want to see if the GMAC0 and GMAC1 are even enabled correctly from the Quartus design at the HPS. In the Quartus HPS config, did you enable GMAC0 and GMAC1 from the dropdowns to select the function? GMAC0: → [ RGMII ] / [ SGMII ] / [ Disabled ] GMAC1: → [ RGMII ] / [ SGMII ] / [ Disabled ] This should be propagated as a handoff data to the U-boot HPS FSBL which will configure the PINMUX and also CLK settings. The Handoff is important to the HPS to configure the ETH clock settings and pinmux. The GMAC0 and GMAC1 reset should already been de-asserted by the U-boot during boot-up. So, i don't suspect the reset de-assertion. It is either the CLK or the PINMUX or a combination of both. Without CLK, the IP will be permanently be disabled. CLK is the major suspect. In short, I suspect that, the GMAC0 and GMAC1 CLK are disabled by a bad Quartus configuration. Log for Linux: Example for a working GMAC on GMAC2. I need to see the similar log for GMAC0 and GMAC1 on your end. I want to know how far it went before it threw the error. [ 0.744790] socfpga-dwmac 10830000.ethernet: Adding to iommu group 0 [ 0.746937] socfpga-dwmac 10830000.ethernet: IRQ eth_wake_irq not found [ 0.747646] socfpga-dwmac 10830000.ethernet: IRQ eth_lpi not found [ 0.748310] socfpga-dwmac 10830000.ethernet: IRQ sfty not found [ 0.749186] socfpga-dwmac 10830000.ethernet: SMTG Hub Cross Timestamp supported [ 0.750124] socfpga-dwmac 10830000.ethernet: User ID: 0x76, Synopsys ID: 0x31 [ 0.750897] socfpga-dwmac 10830000.ethernet: XGMAC2 [ 0.751426] socfpga-dwmac 10830000.ethernet: DMA HW capability register supported [ 0.752217] socfpga-dwmac 10830000.ethernet: RX Checksum Offload Engine supported [ 0.753010] socfpga-dwmac 10830000.ethernet: COE Type 1 [ 0.753568] socfpga-dwmac 10830000.ethernet: TX Checksum insertion supported [ 0.754310] socfpga-dwmac 10830000.ethernet: TSO supported [ 0.754904] socfpga-dwmac 10830000.ethernet: Enabled L3L4 Flow TC (entries=16) [ 0.755669] socfpga-dwmac 10830000.ethernet: Enabled RFS Flow TC (entries=10) [ 0.756422] socfpga-dwmac 10830000.ethernet: TSO feature enabled [ 0.757061] socfpga-dwmac 10830000.ethernet: SPH feature enabled [ 0.757698] socfpga-dwmac 10830000.ethernet: Using 40/40 bits DMA host/device width Re: Linux UIO IRQ related periodic CPU usage We have tried several experiments, and it seems it is unrelated to the interrupt storm. If you have ran out of ideas, my only last resort is not use UIO and write a simple Kernel driver that maps to the IRQ. Just creates a simple handler ISR to dump out the number of interrupts per minute in printf. Every time, you receive an interrupt, you increase a counter. You poll this counter and you do a printf every 1-2minutes so, you don't clutter your log terminal. By bypassing UIO, we can definitely check isolate where the problem is? If there are no CPU spikes by doing so, then, the UIO is a suspect. To me, this would be the simplest way to find out which component causes the problem. Re: Linux UIO IRQ related periodic CPU usage Item 4: Still some question on the user application design: Just to double confirm, WRITE() must always be after READ() and only after you are done with any data processing/event handling. In the interrupt handler, any interrupt trigger will automatically disable the IRQ. The READ() will get a wakeup and at this point, the IRQ is disabled. The call to write() will re-enable the IRQ. If you don't do a WRITE at all, do you see the CPU spikes? In this scenario, the IRQ should be disabled and not entertaining edge interrupt triggers. Check "cat /proc/interrupts" to see if the counts stopped. And if you do a WRITE() from your test, Check "cat /proc/interrupts" to see if the counts increment as expected or it storms with a huge number. bool wait_for_irq (TX_IRQ_HANDLE_S* pHandle) { if (!pHandle || pHandle->txTrigIrqFd < 0) { std::cout << "ERROR: handle"; return false; } uint32_t info = 1; ssize_t nb = write(pHandle->txTrigIrqFd, &info, sizeof(info)); <- this re-enables IRQ, should always be only when you are ready to for the next event. if (nb != (ssize_t)sizeof(info)) { std::cout << "ERROR: writing"; return false; } nb = read(pHandle->txTrigIrqFd, &info, sizeof(info)); if (nb == (ssize_t)sizeof(info)) { return true; } return false; }