Re: [PATCH net v2 1/1] net: stmmac: dwmac-tegra: Read iommu stream id from device tree

From: Thierry Reding
Date: Wed Jan 08 2025 - 11:43:51 EST


On Tue, Jan 07, 2025 at 04:24:59PM -0500, Parker Newman wrote:
> From: Parker Newman <pnewman@xxxxxxxxxxxxxxx>
>
> Nvidia's Tegra MGBE controllers require the IOMMU "Stream ID" (SID) to be
> written to the MGBE_WRAP_AXI_ASID0_CTRL register.
>
> The current driver is hard coded to use MGBE0's SID for all controllers.
> This causes softirq time outs and kernel panics when using controllers
> other than MGBE0.
>
> Example dmesg errors when an ethernet cable is connected to MGBE1:
>
> [ 116.133290] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx
> [ 121.851283] tegra-mgbe 6910000.ethernet eth1: NETDEV WATCHDOG: CPU: 5: transmit queue 0 timed out 5690 ms
> [ 121.851782] tegra-mgbe 6910000.ethernet eth1: Reset adapter.
> [ 121.892464] tegra-mgbe 6910000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0
> [ 121.905920] tegra-mgbe 6910000.ethernet eth1: PHY [stmmac-1:00] driver [Aquantia AQR113] (irq=171)
> [ 121.907356] tegra-mgbe 6910000.ethernet eth1: Enabling Safety Features
> [ 121.907578] tegra-mgbe 6910000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported
> [ 121.908399] tegra-mgbe 6910000.ethernet eth1: registered PTP clock
> [ 121.908582] tegra-mgbe 6910000.ethernet eth1: configuring for phy/10gbase-r link mode
> [ 125.961292] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx
> [ 181.921198] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 181.921404] rcu: 7-....: (1 GPs behind) idle=540c/1/0x4000000000000002 softirq=1748/1749 fqs=2337
> [ 181.921684] rcu: (detected by 4, t=6002 jiffies, g=1357, q=1254 ncpus=8)
> [ 181.921878] Sending NMI from CPU 4 to CPUs 7:
> [ 181.921886] NMI backtrace for cpu 7
> [ 181.922131] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6
> [ 181.922390] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024
> [ 181.922658] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 181.922847] pc : handle_softirqs+0x98/0x368
> [ 181.922978] lr : __do_softirq+0x18/0x20
> [ 181.923095] sp : ffff80008003bf50
> [ 181.923189] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000
> [ 181.923379] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0
> [ 181.924486] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70
> [ 181.925568] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000
> [ 181.926655] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000
> [ 181.931455] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d
> [ 181.938628] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160
> [ 181.945804] x8 : ffff8000827b3160 x7 : f9157b241586f343 x6 : eeb6502a01c81c74
> [ 181.953068] x5 : a4acfcdd2e8096bb x4 : ffffce78ea277340 x3 : 00000000ffffd1e1
> [ 181.960329] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000
> [ 181.967591] Call trace:
> [ 181.970043] handle_softirqs+0x98/0x368 (P)
> [ 181.974240] __do_softirq+0x18/0x20
> [ 181.977743] ____do_softirq+0x14/0x28
> [ 181.981415] call_on_irq_stack+0x24/0x30
> [ 181.985180] do_softirq_own_stack+0x20/0x30
> [ 181.989379] __irq_exit_rcu+0x114/0x140
> [ 181.993142] irq_exit_rcu+0x14/0x28
> [ 181.996816] el1_interrupt+0x44/0xb8
> [ 182.000316] el1h_64_irq_handler+0x14/0x20
> [ 182.004343] el1h_64_irq+0x80/0x88
> [ 182.007755] cpuidle_enter_state+0xc4/0x4a8 (P)
> [ 182.012305] cpuidle_enter+0x3c/0x58
> [ 182.015980] cpuidle_idle_call+0x128/0x1c0
> [ 182.020005] do_idle+0xe0/0xf0
> [ 182.023155] cpu_startup_entry+0x3c/0x48
> [ 182.026917] secondary_start_kernel+0xdc/0x120
> [ 182.031379] __secondary_switched+0x74/0x78
> [ 212.971162] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6103 jiffies s: 417 root: 0x80/.
> [ 212.985935] rcu: blocking rcu_node structures (internal RCU debug):
> [ 212.992758] Sending NMI from CPU 0 to CPUs 7:
> [ 212.998539] NMI backtrace for cpu 7
> [ 213.004304] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6
> [ 213.016116] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024
> [ 213.030817] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 213.040528] pc : handle_softirqs+0x98/0x368
> [ 213.046563] lr : __do_softirq+0x18/0x20
> [ 213.051293] sp : ffff80008003bf50
> [ 213.055839] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000
> [ 213.067304] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0
> [ 213.077014] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70
> [ 213.087339] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000
> [ 213.097313] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000
> [ 213.107201] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d
> [ 213.116651] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160
> [ 213.127500] x8 : ffff8000827b3160 x7 : 0a37b344852820af x6 : 3f049caedd1ff608
> [ 213.138002] x5 : cff7cfdbfaf31291 x4 : ffffce78ea277340 x3 : 00000000ffffde04
> [ 213.150428] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000
> [ 213.162063] Call trace:
> [ 213.165494] handle_softirqs+0x98/0x368 (P)
> [ 213.171256] __do_softirq+0x18/0x20
> [ 213.177291] ____do_softirq+0x14/0x28
> [ 213.182017] call_on_irq_stack+0x24/0x30
> [ 213.186565] do_softirq_own_stack+0x20/0x30
> [ 213.191815] __irq_exit_rcu+0x114/0x140
> [ 213.196891] irq_exit_rcu+0x14/0x28
> [ 213.202401] el1_interrupt+0x44/0xb8
> [ 213.207741] el1h_64_irq_handler+0x14/0x20
> [ 213.213519] el1h_64_irq+0x80/0x88
> [ 213.217541] cpuidle_enter_state+0xc4/0x4a8 (P)
> [ 213.224364] cpuidle_enter+0x3c/0x58
> [ 213.228653] cpuidle_idle_call+0x128/0x1c0
> [ 213.233993] do_idle+0xe0/0xf0
> [ 213.237928] cpu_startup_entry+0x3c/0x48
> [ 213.243791] secondary_start_kernel+0xdc/0x120
> [ 213.249830] __secondary_switched+0x74/0x78
>
> This bug has existed since the dwmac-tegra driver was added in Dec 2022
> (See Fixes tag below for commit hash).
>
> The Tegra234 SOC has 4 MGBE controllers, however Nvidia's Developer Kit
> only uses MGBE0 which is why the bug was not found previously. Connect Tech
> has many products that use 2 (or more) MGBE controllers.
>
> The solution is to read the controller's SID from the existing "iommus"
> device tree property. The 2nd field of the "iommus" device tree property
> is the controller's SID.
>
> Device tree snippet from tegra234.dtsi showing MGBE1's "iommus" property:
>
> smmu_niso0: iommu@12000000 {
> compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500";
> ...
> }
>
> /* MGBE1 */
> ethernet@6900000 {
> compatible = "nvidia,tegra234-mgbe";
> ...
> iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>;
> ...
> }
>
> Nvidia's arm-smmu driver reads the "iommus" property and stores the SID in
> the MGBE device's "fwspec" struct. The dwmac-tegra driver can access the
> SID using the tegra_dev_iommu_get_stream_id() helper function found in
> linux/iommu.h.
>
> Calling tegra_dev_iommu_get_stream_id() should not fail unless the "iommus"
> property is removed from the device tree or the IOMMU is disabled.
>
> While the Tegra234 SOC technically supports bypassing the IOMMU, it is not
> supported by the current firmware, has not been tested and not recommended.
> More detailed discussion with Thierry Reding from Nvidia linked below.
>
> Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support")
> Link: https://lore.kernel.org/netdev/cover.1731685185.git.pnewman@xxxxxxxxxxxxxxx
> Signed-off-by: Parker Newman <pnewman@xxxxxxxxxxxxxxx>
> ---
>
> Changes v2:
> - dropped cover letter
> - added more detail to commit message
> - rebased to latest netdev tree
>
> drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)

Acked-by: Thierry Reding <treding@xxxxxxxxxx>

Attachment: signature.asc
Description: PGP signature