Re: [PATCH net] net/mlx5: Fix error path in multi-packet WQE transmit

From: Maxim Mikityanskiy
Date: Wed Sep 11 2024 - 04:59:40 EST


On Tue, 10 Sep 2024 at 10:53:51 +0200, Gerd Bayer wrote:
> Remove the erroneous unmap in case no DMA mapping was established
>
> The multi-packet WQE transmit code attempts to obtain a DMA mapping for
> the skb. This could fail, e.g. under memory pressure, when the IOMMU
> driver just can't allocate more memory for page tables. While the code
> tries to handle this in the path below the err_unmap label it erroneously
> unmaps one entry from the sq's FIFO list of active mappings.

The fix looks valid to me, thanks!

Acked-by: Maxim Mikityanskiy <maxtram95@xxxxxxxxx>

> Since the
> current map attempt failed this unmap is removing some random DMA mapping
> that might still be required. If the PCI function now presents that IOVA,
> the IOMMU may assumes a rogue DMA access and e.g. on s390 puts the PCI
> function in error state.
>
> The erroneous behavior was seen in a stress-test environment that created
> memory pressure.
>
> Fixes: 5af75c747e2a ("net/mlx5e: Enhanced TX MPWQE for SKBs")
> Signed-off-by: Gerd Bayer <gbayer@xxxxxxxxxxxxx>
> ---
> While running some stress tests that put our system under memory pressure
> we observed the following splat, eventually:
>
> [ 1350.038775] ------------[ cut here ]------------
> [ 1350.038776] WARNING: CPU: 36 PID: 37194 at arch/s390/include/asm/pci_dma.h:136 dma_update_cpu_trans+0x66/0x70
> [ 1350.038799] Modules linked in: macvtap macvlan vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables nfnetlink lcs ctcm fsm dasd_fba_mod mlx5_ib ib_uverbs ib_core mlx5_core
> "
> "mlxfw psample rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs tls dm_service_time 8021q garp mrp rfkill sd_mod t10_pi sg sunrpc zfcp scsi_transport_fc dm_multipath dm_mod vfio_ccw mdev vfio_iommu_type1 vfio eadm_sch iommufd kvm drm i2c_core drm_panel_orientation_quirks xfs libcrc32c qeth_l2
> "
> " bridge stp llc ghash_s390 prng aes_s390 dasd_eckd_mod des_s390 libdes sha3_512_s390 qeth sha3_256_s390 dasd_mod ccwgroup qdio pkey zcrypt fuse
> [ 1350.038880] CPU: 36 PID: 37194 Comm: vhost-37179 Kdump: loaded Tainted: G X ------- --- 5.14.0-427.20.1.el9_4.s390x #1
> [ 1350.038884] Hardware name: IBM 3931 A01 400 (LPAR)
> [ 1350.038886] Krnl PSW : 0704f00180000000 00000056803d1eba (dma_update_cpu_trans+0x6a/0x70)
> [ 1350.038890] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
> [ 1350.038893] Krnl GPRS: 0000000000000000 0000000589eff400 0000003be2b477b0 0000000000000000
> [ 1350.038895] 0000000000000400 0000000000001000 0000000000000400 ffffffbe8000a000
> [ 1350.038897] 0000000000000001 0000000086d6bc00 0000000000000001 000000417fff7000
> [ 1350.038900] 000000012d5baa00 0000000000000000 00000056803d1f3e 0000038016df75d8
> [ 1350.038957] Krnl Code: 00000056803d1eae: af000000 mc 0,0
> [ 1350.038963] 00000056803d1eb2: a7f4fff9 brc 15,00000056803d1ea4
> [ 1350.038963] #00000056803d1eb6: af000000 mc 0,0
> [ 1350.038970] >00000056803d1eba: a7f4ffd9 brc 15,00000056803d1e6c
> [ 1350.038979] 00000056803d1ebe: 0707 bcr 0,%r7
> [ 1350.038983] 00000056803d1ec0: c004004b3334 brcl 0,0000005680d38528
> [ 1350.038983] 00000056803d1ec6: eb7ff0500024 stmg %r7,%r15,80(%r15)
> [ 1350.038983] 00000056803d1ecc: b90400ef lgr %r14,%r15
> [ 1350.038994] Call Trace:
> [ 1350.038995] [<00000056803d1eba>] dma_update_cpu_trans+0x6a/0x70
> [ 1350.038998] ([<00000056803d1f22>] __dma_update_trans+0x62/0x150)
> [ 1350.039001] [<00000056803d2432>] s390_dma_unmap_pages+0x72/0x1c0
> [ 1350.039003] [<000000568047e70c>] dma_unmap_page_attrs+0x3c/0x190
> [ 1350.039008] [<000003ff807c5230>] mlx5e_sq_xmit_mpwqe+0x2b0/0x430 [mlx5_core]
> [ 1350.039170] [<000003ff807c589e>] mlx5e_xmit+0x20e/0x5a0 [mlx5_core]
> [ 1350.039246] [<0000005680aae326>] dev_hard_start_xmit+0xb6/0x210
> [ 1350.039252] [<0000005680b144d8>] sch_direct_xmit+0x88/0x420
> [ 1350.039256] [<0000005680aa9496>] __dev_xmit_skb+0x2c6/0x5c0
> [ 1350.039259] [<0000005680aae93e>] __dev_queue_xmit+0x36e/0x840
> [ 1350.039262] [<000003ff809e3b6a>] macvlan_start_xmit+0x6a/0x140 [macvlan]
> [ 1350.039266] [<0000005680aae326>] dev_hard_start_xmit+0xb6/0x210
> [ 1350.039269] [<0000005680aaeae8>] __dev_queue_xmit+0x518/0x840
> [ 1350.039271] [<000003ff809b40f4>] tap_get_user_xdp.isra.0+0x134/0x300 [tap]
> [ 1350.039274] [<000003ff809b4354>] tap_sendmsg+0x94/0xc0 [tap]
> [ 1350.039277] [<000003ff809d4f06>] vhost_tx_batch.constprop.0+0x66/0x1a0 [vhost_net]
> [ 1350.039281] [<000003ff809d6a5e>] handle_tx_copy+0x24e/0x340 [vhost_net]
> [ 1350.039283] [<000003ff809d6c0c>] handle_tx+0xbc/0x100 [vhost_net]
> [ 1350.039286] [<000003ff809bb6f2>] vhost_worker+0xa2/0x100 [vhost]
> [ 1350.039294] [<000000568040be98>] kthread+0x108/0x110
> [ 1350.039299] [<000000568038afdc>] __ret_from_fork+0x3c/0x60
> [ 1350.039302] [<0000005680d2e89a>] ret_from_fork+0xa/0x40
> [ 1350.039307] Last Breaking-Event-Address:
> [ 1350.039308] [<00000056803d1e68>] dma_update_cpu_trans+0x18/0x70
> [ 1350.039310] ---[ end trace a581115ebebd62f3 ]---
>
> And here the IOMMU complains about the "rogue DMA attempt":
> [ 1350.043079] zpci: 0037:00:00.0: Event 0x7 reports an error for PCI function 0x3932
>
> With some instrumentation in mlx5e_sq_xmit_mpwqe() to mimic a failure
> to DMA map every 1000th buffer, I was able to reproduce this with recent
> upstream code, too. I think the error handling of that routine has a bug
> as it DMA unmaps a buffer/IOVA that might be used, still.
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> index b09e9abd39f3..f8c7912abe0e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> @@ -642,7 +642,6 @@ mlx5e_sq_xmit_mpwqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
> return;
>
> err_unmap:
> - mlx5e_dma_unmap_wqe_err(sq, 1);
> sq->stats->dropped++;
> dev_kfree_skb_any(skb);
> mlx5e_tx_flush(sq);
>
> ---
> base-commit: 8d53a5170c8677af9b3fbd9d0b75ae120fdefba2
> change-id: 20240909-fix-mlx5_dma_unmap-e2a12e26e929
>
> Best regards,
> --
> Gerd Bayer <gbayer@xxxxxxxxxxxxx>
>