Re: [PATCH net-next v5 3/5] veth: implement Byte Queue Limits (BQL) for latency reduction
From: Simon Schippers
Date: Wed May 27 2026 - 04:16:29 EST
On 5/27/26 09:38, Jesper Dangaard Brouer wrote:
>
>
> On 26/05/2026 17.07, Simon Schippers wrote:
>> On 5/26/26 16:55, Jonas Köppeler wrote:
>>> On 5/26/26 4:35 PM, Simon Schippers wrote:
>>>> On 5/26/26 11:54, Jonas Köppeler wrote:
>>>>> On 5/23/26 6:09 PM, Simon Schippers wrote:
>>>>>> On 5/22/26 18:26, Jonas Köppeler wrote:
>>>>>>> On 5/22/26 10:41, Simon Schippers wrote:
>>>>>>>> On 5/22/26 09:14, Jonas Köppeler wrote:
>>>>>>>>> On 5/19/26 10:51 PM, Simon Schippers wrote:
>>>>>>>>>> On 5/12/26 23:55, Simon Schippers wrote:
>>>>>>>>>>> On 5/12/26 15:54, Jesper Dangaard Brouer wrote:
>>>>>>>>>>>>>> Nope, I'm using a bpftrace program to keep track of the inflight/limit
>>>>>>>>>>>>>> in a BPF hashmap. Reading from /sys will not be accurate.
>>>>>>>>>>>>> Ah nice.
>>>>>>>>>>>> Add the option --hist to have both NAPI and BQL histograms printed when
>>>>>>>>>>>> script ends. This will give you an accurate pattern of how inflight and
>>>>>>>>>>>> limit evolves.
>>>>>>>>>>>>
>>>>>>>>>>>>>> I moved the selftests into a github repo [1] to allow us to collaborate
>>>>>>>>>>>>>> and evaluate the changes more easily. I explicitly kept the new BPF
>>>>>>>>>>>>>> based BQL tracking as a commit[2] for your benefit.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] https://github.com/netoptimizer/veth-backpressure-performance-testing/tree/main/selftests
>
> [... cut ...]
>
>>>>
>>>> I will wait for your new measurements, but there is no argument
>>>> against a default tx-usecs of ~100us for now, right?
>>> Yes, I think 100us is perfectly fine. I guess most of it was
>>> just my curiosity why the latency values are as they are 🙂
>> Which is great, because I was wondering the same 🙂
>>
>
> Thank you Jonas and Simon for testing this via[1] on your systems.
>
> One performance concern from my side is if/when BQL limit goes below 8
> packets. This will cause cache-line bouncing and many qdisc requeues
> between the two CPUs. Notice that 8 packets for the ptr_ring is one
> cache-line. This is why I suggested defaulting BQL min_limit to be 8.
> This would work in combination with the tx-usecs coalesce tuning as a
> lower bound.
>
>
>>> But it feels like this will need some documentation, because
>>> as we have seen, some values are a little different
>>> from what you expect from bql. Inflight > veth_ring_size,
>>> tx-usecs not necessarily achieving the configured value,
>>> inflight can get stuck > 0. Wdyt?
>>> But I think it works nicely overall.
>>>
>> Exactly, we should get ready for a v6 soon.
>>
>
> I want to send out a V6 today (and I'm busy rest of week).
> I'm simply going to include Simon's patch[2]. Then we can iterate on
> that email thread, making it easier for people to reply inline to the
> code changes.
>
> [2] https://lore.kernel.org/all/9a57bb41-114e-492b-9eaa-52237675bb7c@xxxxxxxxxxxxxxxxx/
>
Sure, but please address those 2 nits:
- replace unlikely(!ring->size) || __ptr_ring_full(ring)
with unlikely(__ptr_ring_check_produce(ring))
- replace ptr_ring_empty() with __ptr_ring_empty()
Apart from that I think the code is fine.
Just the commit messages need changes but we can adapt them later.
>> And I think we should move the BQL logic into a seperate .h file
>> as a library. Then it is also usable for TUN/TAP in the future.
>>
>
> I like the idea, but refactoring into a library seems out-of-scope for
> this patchset. As a first step, let us see if this works for veth
> driver use-case. I do encourage Simon to refactor when doing the work
> for TUN/TAP in the future.
Okay, fair point. I do not have a problem with that.
I did this split in [1] but we can ignore that for now.
[1] Link: https://github.com/jkoeppeler/veth-backpressure-performance-testing/pull/1
>
> Alternatively netdev maintainers can choose to apply V5 patchset and we
> can iterate on finding the right ethtool coalesce parameters for veth
> and create a .h "library" file for this.
>
>
>> Let's amend the commits. Should we do this on Github?
>
> Kernel development for netdev [3] happens on the mailing list with patches via email.
Okay :)
Thanks!
>
> --Jesper
>
> [3] https://kernel.org/doc/html/latest/process/maintainer-netdev.html
>