Re: [PATCH net-next 00/24] locking: Introduce nested-BH locking.

From: Jakub Kicinski
Date: Mon Dec 18 2023 - 19:41:57 EST


On Mon, 18 Dec 2023 18:23:31 +0100 Sebastian Andrzej Siewior wrote:
> On 2023-12-15 14:50:59 [-0800], Jakub Kicinski wrote:
> > On Fri, 15 Dec 2023 18:07:19 +0100 Sebastian Andrzej Siewior wrote:
> > > The proposed way out is to introduce explicit per-CPU locks for
> > > resources which are protected by local_bh_disable() and use those only
> > > on PREEMPT_RT so there is no additional overhead for !PREEMPT_RT builds.
> >
> > As I said at LPC, complicating drivers with odd locking constructs
> > is a no go for me.
>
> I misunderstood it then as I assumed you wanted to ease the work while I
> was done which every driver after (hopefully) understanding what is
> possible/ needed and what not. We do speak here about 15++?

My main concern is that it takes the complexity of writing network
device drivers to a next level. It's already hard enough to implement
XDP correctly. "local lock" and "guard"? Too complicated :(
Or "unmaintainable" as in "too much maintainer's time will be spent
reviewing code that gets this wrong".

> Now. The pattern is usually
> | act = bpf_prog_run_xdp(xdp_prog, &xdp);
> | switch (act) {
> | case XDP_REDIRECT:
> | ret = xdp_do_redirect(netdev, &xdp, xdp_prog)))
> | if (ret)
> | goto XDP_ABORTED;
> | xdp_redir++ or so;
>
> so we might be able to turn this into something that covers both and
> returns either XDP_REDIRECT or XDP_ABORTED. So this could be merged
> into
>
> | u32 bpf_prog_run_xdp_and_redirect(struct net_device *dev, const struct
> | bpf_prog *prog, struct xdp_buff *xdp)
> | {
> | u32 act;
> | int ret;
> |
> | act = bpf_prog_run_xdp(prog, xdp);
> | if (act == XDP_REDIRECT) {
> | ret = xdp_do_redirect(netdev, xdp, prog);
> | if (ret < 0)
> | act = XDP_ABORTED;
> | }
> | return act;
> | }

If we could fold the DROP case into this -- even better!

> so the lock can be put inside the function and all drivers use this
> function.
>
> From looking through drivers/net/ethernet/, this should work for most
> drivers:
> - amazon/ena
> - aquantia/atlantic
> - engleder/tsnep
> - freescale/enetc
> - freescale/fec
> - intel/igb
> - intel/igc
> - marvell/mvneta
> - marvell/mvpp2
> - marvell/octeontx2
> - mediatek/mtk
> - mellanox/mlx5
> - microchip/lan966x
> - microsoft/mana
> - netronome/nfp (two call paths with no support XDP_REDIRECT)
> - sfc/rx
> - sfc/siena (that offset pointer can be moved)
> - socionext/netsec
> - stmicro/stmmac
>
> A few do something custom/ additionally between bpf_prog_run_xdp() and
> xdp_do_redirect():
>
> - broadcom/bnxt
> calculates length, offset, data pointer. DMA unmaps + memory
> allocations before redirect.

Just looked at this one. The recalculation is probably for the PASS /
TX cases, REDIRECT / DROP shouldn't care. The DMA unmap looks like
a bug (hi, Michael!)

> - freescale/dpaa2
> - freescale/dpaa
> sets xdp.data_hard_start + frame_sz, unmaps DMA.
>
> - fungible/funeth
> conditional redirect.
>
> - google/gve
> Allocates a new packet for redirect.
>
> - intel/ixgbe
> - intel/i40e
> - intel/ice
> Failure in the ZC case is different from XDP_ABORTED, depends on the
> error from xdp_do_redirect())
>
> - mellanox/mlx4/
> calculates page_offset.
>
> - qlogic/qede
> DMA unmap and buffer alloc.
>
> - ti/cpsw_priv
> recalculates length (pointer).
>
> and a few more don't support XDP_REDIRECT:
>
> - cavium/thunder
> does not support XDP_REDIRECT, calculates length, offset.
>
> - intel/ixgbevf
> does not support XDP_REDIRECT
>
> I don't understand why some driver need to recalculate data_hard_start,
> length and so on and others don't. This might be only needed for the
> XDP_TX case or not needed…
> Also I'm not sure about the dma unmaps and skb allocations. The new skb
> allocation can be probably handled before running the bpf prog but then
> in the XDP_PASS case it is a waste…
> And the DMA unmaps. Only a few seem to need it. Maybe it can be done
> before running the BPF program. After all the bpf may look into the skb.
>
>
> If that is no go, then the only thing that comes to mind is (as you
> mentioned on LPC) to acquire the lock in bpf_prog_run_xdp() and drop it
> in xdp_do_redirect(). This would require that every driver invokes
> xdp_do_redirect() even not if it is not supporting it (by setting netdev
> to NULL or so).

To make progress on other parts of the stack we could also take
the local lock around all of napi->poll() for now..