Re: [PATCH net-next v5 3/5] veth: implement Byte Queue Limits (BQL) for latency reduction

From: Jesper Dangaard Brouer

Date: Mon May 11 2026 - 04:33:46 EST




On 10/05/2026 17.56, Jakub Kicinski wrote:
On Sat, 9 May 2026 11:09:51 +0200 Jesper Dangaard Brouer wrote:
On 09/05/2026 04.06, Jakub Kicinski wrote:
On Thu, 7 May 2026 21:09:09 +0200 Jesper Dangaard Brouer wrote:
Not against being able to modify VETH_RING_SIZE, but I don't think it is
the solution here.

Was it evaluated, tho?

It's obviously super easy these days have AI spew no end of complex
code. So it'd be great to have some solid, ideally production-like
data to back this all up.

VETH_RING_SIZE seems trivial, ethtool set ringparam

No, unfortunately we cannot just decrease the VETH_RING_SIZE.

To be clear - I said may it configurable with ethtool -G
not change the default.


Sure, I understand the desire to make VETH_RING_SIZE configurable.
If doing so we are making Linux network stack harder to tune and setup
correctly. E.g. adding a qdisc to veth would also require changing the
ring size, but if system also uses XDP then tuning below 64 (likely 128)
will lead to hard-to-find packet drops.

I prefer adding something (like BQL) that auto-tune how much of the ring
queue we are using. Good queues function as shock absorbers when
concurrent processes in the OS have scheduling noise.

I acknowledge that Simon Schippers found that the BQL implementation was
actually not auto-tuning. We need to work on this, my prototype
implementation [1] [2] works surprisingly well.


- [1] https://lore.kernel.org/all/3e43117f-356d-4086-a176-abd7fe2e6f0a@xxxxxxxxxx/2-09-veth-time-based-bql-coalescing.patch
- [2] https://lore.kernel.org/all/3e43117f-356d-4086-a176-abd7fe2e6f0a@xxxxxxxxxx/


The reason is that XDP-redirect into veth don't have any
back-pressure and would simply drop packets if queue size becomes
less than the NAPI budget (64). (Yes, we use both normal path and
XDP-redirect in production).

Doesn't this mean you have a queue which is not under BQL control?


It is a matter of perspective. BQL needs between 17-55 elements in the
256 queue. At the same time we handle if the ring runs full, e.g. due
to a sudden burst of XDP redirected packets, which pushes packets into
the qdisc layer.


My benchmarking shows that an optimal BQL limit is dynamically
adjusted between 17-55 depending on veth consumer namespace
overhead/speed, when balancing throughput and latency.

Testing with prod-approximating traffic pattern and load would be great.

That is what I'm doing. I'm testing with prod-approximating traffic
pattern and changing the number of iptables rules to simulate the
overhead I measured from production. I think I explained this in the
cover letter. We are going to use this in a production environment (to
be clear).

Simon found an issue testing the overload scenario.

--Jesper