Re: [PATCH net-next 00/24] locking: Introduce nested-BH locking.

From: Sebastian Andrzej Siewior
Date: Mon Dec 18 2023 - 12:24:19 EST


On 2023-12-15 14:50:59 [-0800], Jakub Kicinski wrote:
> On Fri, 15 Dec 2023 18:07:19 +0100 Sebastian Andrzej Siewior wrote:
> > The proposed way out is to introduce explicit per-CPU locks for
> > resources which are protected by local_bh_disable() and use those only
> > on PREEMPT_RT so there is no additional overhead for !PREEMPT_RT builds.
>
> As I said at LPC, complicating drivers with odd locking constructs
> is a no go for me.

I misunderstood it then as I assumed you wanted to ease the work while I
was done which every driver after (hopefully) understanding what is
possible/ needed and what not. We do speak here about 15++?

Now. The pattern is usually
| act = bpf_prog_run_xdp(xdp_prog, &xdp);
| switch (act) {
| case XDP_REDIRECT:
| ret = xdp_do_redirect(netdev, &xdp, xdp_prog)))
| if (ret)
| goto XDP_ABORTED;
| xdp_redir++ or so;

so we might be able to turn this into something that covers both and
returns either XDP_REDIRECT or XDP_ABORTED. So this could be merged
into

| u32 bpf_prog_run_xdp_and_redirect(struct net_device *dev, const struct
| bpf_prog *prog, struct xdp_buff *xdp)
| {
| u32 act;
| int ret;
|
| act = bpf_prog_run_xdp(prog, xdp);
| if (act == XDP_REDIRECT) {
| ret = xdp_do_redirect(netdev, xdp, prog);
| if (ret < 0)
| act = XDP_ABORTED;
| }
| return act;
| }

so the lock can be put inside the function and all drivers use this
function.

From looking through drivers/net/ethernet/, this should work for most
drivers:
- amazon/ena
- aquantia/atlantic
- engleder/tsnep
- freescale/enetc
- freescale/fec
- intel/igb
- intel/igc
- marvell/mvneta
- marvell/mvpp2
- marvell/octeontx2
- mediatek/mtk
- mellanox/mlx5
- microchip/lan966x
- microsoft/mana
- netronome/nfp (two call paths with no support XDP_REDIRECT)
- sfc/rx
- sfc/siena (that offset pointer can be moved)
- socionext/netsec
- stmicro/stmmac

A few do something custom/ additionally between bpf_prog_run_xdp() and
xdp_do_redirect():

- broadcom/bnxt
calculates length, offset, data pointer. DMA unmaps + memory
allocations before redirect.

- freescale/dpaa2
- freescale/dpaa
sets xdp.data_hard_start + frame_sz, unmaps DMA.

- fungible/funeth
conditional redirect.

- google/gve
Allocates a new packet for redirect.

- intel/ixgbe
- intel/i40e
- intel/ice
Failure in the ZC case is different from XDP_ABORTED, depends on the
error from xdp_do_redirect())

- mellanox/mlx4/
calculates page_offset.

- qlogic/qede
DMA unmap and buffer alloc.

- ti/cpsw_priv
recalculates length (pointer).

and a few more don't support XDP_REDIRECT:

- cavium/thunder
does not support XDP_REDIRECT, calculates length, offset.

- intel/ixgbevf
does not support XDP_REDIRECT

I don't understand why some driver need to recalculate data_hard_start,
length and so on and others don't. This might be only needed for the
XDP_TX case or not needed…
Also I'm not sure about the dma unmaps and skb allocations. The new skb
allocation can be probably handled before running the bpf prog but then
in the XDP_PASS case it is a waste…
And the DMA unmaps. Only a few seem to need it. Maybe it can be done
before running the BPF program. After all the bpf may look into the skb.


If that is no go, then the only thing that comes to mind is (as you
mentioned on LPC) to acquire the lock in bpf_prog_run_xdp() and drop it
in xdp_do_redirect(). This would require that every driver invokes
xdp_do_redirect() even not if it is not supporting it (by setting netdev
to NULL or so).

Sebastian