Re: [PATCH net-next v6 5/5] veth: time-based BQL completion coalescing via ethtool tx-usecs

From: Jonas Köppeler

Date: Wed Jun 03 2026 - 04:35:45 EST

On 6/2/26 17:37, Simon Schippers wrote:

On 6/2/26 09:24, Jonas Köppeler wrote:

On 6/1/26 6:16 PM, Simon Schippers wrote:

On 6/1/26 16:03, Jonas Köppeler wrote:

On 6/1/26 2:00 PM, Simon Schippers wrote:

On 5/28/26 09:46, Jonas Köppeler wrote:

On 5/27/26 3:54 PM, hawk@xxxxxxxxxx wrote:

From: Simon Schippers <simon.schippers@xxxxxxxxxxxxxx>

Per-packet BQL completion forces DQL to converge on limit=2, causing
excessive NAPI scheduling overhead and qdisc requeues.

Accumulate BQL completions and flush them when a configurable time
threshold is exceeded, letting DQL discover a limit that bounds actual
queuing delay to the configured interval. Coalescing state persists
across NAPI polls in struct veth_rq so completions can accumulate
beyond a single budget=64 cycle.

Add ethtool tx-usecs support for runtime tuning. Default is 100 us;
setting tx-usecs to 0 disables coalescing and falls back to per-packet
completion.

ethtool -C <veth-dev> tx-usecs 500 # 500us coalescing
ethtool -C <veth-dev> tx-usecs 0 # per-packet (no coalescing)

Co-developed-by: Jesper Dangaard Brouer <hawk@xxxxxxxxxx>
Signed-off-by: Jesper Dangaard Brouer <hawk@xxxxxxxxxx>
Signed-off-by: Simon Schippers <simon.schippers@xxxxxxxxxxxxxx>

Tested-by: Jonas Köppeler<j.koeppeler@xxxxxxxxxxxx>

Thanks for your testing!

However, I have issues reproducing.
I run bare metal (without virtme) with v6 + your pktgen patch
and I am on the branch pktgen-and-benchmark, commit
"results: add veth-bql measurements":

1. ping fails with 100% packet loss ~20% of the times with --pktgen.
When this happens the avg ping of this run is mistakenly set
to 0.0 ms, which distorts the results.
I fixed it locally by rerunning when this happens.

2. pktgen runs with > 3 Mpps even with --nrules 10000, see log below.
I see that this is because of qdisc drops.
I also tried pfifo and sfq but with the same result.
I spent quite some time on it but I do not know a fix.

Do you have an idea?
Thanks!

Hi,
yes there are some changes missing in the test script.
I have pushed it now, sorry. This should fix 1.

I pulled it and ran...

sudo ./veth_bql_sweep.sh --runs 1 --pktgen --duration 20 --qdisc fq_codel --no-bpftrace

... but still 8/32=1/4 of the pings are zero, I do not see
a pattern.

I grabbed the logs from /tmp and this is what a failing
ping looks like:

PING 10.99.0.2 (10.99.0.2) 56(84) bytes of data.

--- 10.99.0.2 ping statistics ---
97 packets transmitted, 0 received, 100% packet loss, time 19967ms

Feels like a race or something..
Can you reproduce with the exact command?
I think you need --runs 1, else it just averages over multiple
runs.

Sorry, no I could not reproduce this. I used the exact same
command as you did, and I am using net-next/main + v6 patches.
I have 0% ping loss across all tests. Does the ping loss
happen regardless of the qdisc?

Yes, it happens for each qdisc I tested.
As a fix I changed the script to rerun if this happens.

I was able to reproduce it and I think it is because you are lacking
support of the qdiscs, and thus you will fallback to pfifo, this
explains why it occasionally will drop all packets.

With that I ran the benchmark and also created a script to have
the result as an ASCII table.
I think it would make sense to include something like this in
the commit message.

Throughput (pps)
==================================================
nrules | 0us | 100us | 1000us | 10000us || stock
-------+-------+-------+--------+---------++------
0 | 1.65M | 1.75M | 1.74M | 1.74M || 1.73M
100 | 684K | 755K | 730K | 728K || 744K
1000 | 119K | 126K | 126K | 125K || 126K
10000 | 13K | 12K | 13K | 13K || 13K

Ping RTT ms (avg)
==================================================
nrules | 0us | 100us | 1000us | 10000us || stock
-------+-------+-------+--------+---------++------
0 | 0.016 | 0.138 | 0.137 | 0.135 || 0.133
100 | 0.029 | 0.185 | 0.310 | 0.315 || 0.310
1000 | 0.137 | 0.321 | 1.66 | 1.81 || 1.78
10000 | 1.22 | 1.87 | 3.02 | 16.0 || 17.2

Yes, good idea. I think we should report p99 + max_ping and not the avg.

Regarding 2.: do not look at the pktgen output, in the
new version you will see something like "goodput",
which is the number you should look for.
Pktgen will report at what speed it enqueued packets in
the qdisc.

Exactly. Now it works. Had a single outlier but apart from that
everything is fine.

Thanks,
Simon

Let me know if it worked.
Best,
Jonas