Re: [PATCH net-next v5 3/5] veth: implement Byte Queue Limits (BQL) for latency reduction
From: Jesper Dangaard Brouer
Date: Mon May 11 2026 - 14:23:02 EST
On 11/05/2026 11.55, Simon Schippers wrote:
On 5/11/26 10:11, Jesper Dangaard Brouer wrote:
On 10/05/2026 17.56, Jakub Kicinski wrote:
On Sat, 9 May 2026 11:09:51 +0200 Jesper Dangaard Brouer wrote:
On 09/05/2026 04.06, Jakub Kicinski wrote:
On Thu, 7 May 2026 21:09:09 +0200 Jesper Dangaard Brouer wrote:
Not against being able to modify VETH_RING_SIZE, but I don't think it is
the solution here.
Was it evaluated, tho?
It's obviously super easy these days have AI spew no end of complex
code. So it'd be great to have some solid, ideally production-like
data to back this all up.
VETH_RING_SIZE seems trivial, ethtool set ringparam
No, unfortunately we cannot just decrease the VETH_RING_SIZE.
To be clear - I said may it configurable with ethtool -G
not change the default.
Sure, I understand the desire to make VETH_RING_SIZE configurable.
If doing so we are making Linux network stack harder to tune and setup
correctly. E.g. adding a qdisc to veth would also require changing the
ring size, but if system also uses XDP then tuning below 64 (likely 128)
will lead to hard-to-find packet drops.
I mean 64 still could be a 4x improvement at least.
No not really, setting it to 64 will give same (bad) latency from "BQL
off" which that patchset is trying to address.
I prefer adding something (like BQL) that auto-tune how much of the ring
queue we are using. Good queues function as shock absorbers when
concurrent processes in the OS have scheduling noise.
I acknowledge that Simon Schippers found that the BQL implementation was
actually not auto-tuning. We need to work on this, my prototype
implementation [1] [2] works surprisingly well.
- [1] https://lore.kernel.org/all/3e43117f-356d-4086-a176-abd7fe2e6f0a@xxxxxxxxxx/2-09-veth-time-based-bql-coalescing.patch
- [2] https://lore.kernel.org/all/3e43117f-356d-4086-a176-abd7fe2e6f0a@xxxxxxxxxx/
The reason is that XDP-redirect into veth don't have any
back-pressure and would simply drop packets if queue size becomes
less than the NAPI budget (64). (Yes, we use both normal path and
XDP-redirect in production).
Doesn't this mean you have a queue which is not under BQL control?
It is a matter of perspective. BQL needs between 17-55 elements in the
256 queue. At the same time we handle if the ring runs full, e.g. due
to a sudden burst of XDP redirected packets, which pushes packets into
the qdisc layer.
You are checking inflight/limit in /sys directory to get the 17-55
number, right?
Nope, I'm using a bpftrace program to keep track of the inflight/limit
in a BPF hashmap. Reading from /sys will not be accurate.
I moved the selftests into a github repo [1] to allow us to collaborate
and evaluate the changes more easily. I explicitly kept the new BPF
based BQL tracking as a commit[2] for your benefit.
[1] https://github.com/netoptimizer/veth-backpressure-performance-testing/tree/main/selftests
[2] https://github.com/netoptimizer/veth-backpressure-performance-testing/commit/f25c5dc92977
Sorry for cutting the remaining of the message, but I ran out of time,
as things are a bit challenging/hectic here at Cloudflare at the moment.
--Jesper