Re: [PATCH net-next v8 0/4] tun/tap & vhost-net: apply qdisc backpressure on full ptr_ring to reduce TX drops

From: Michael S. Tsirkin

Date: Thu Mar 12 2026 - 10:02:38 EST


On Thu, Mar 12, 2026 at 02:06:35PM +0100, Simon Schippers wrote:
> This patch series deals with tun/tap & vhost-net which drop incoming
> SKBs whenever their internal ptr_ring buffer is full. Instead, with this
> patch series, the associated netdev queue is stopped - but only when a
> qdisc is attached. If no qdisc is present the existing behavior is
> preserved. This patch series touches tun/tap and vhost-net, as they
> share common logic and must be updated together. Modifying only one of
> them would break the other.
>
> By applying proper backpressure, this change allows the connected qdisc to
> operate correctly, as reported in [1], and significantly improves
> performance in real-world scenarios, as demonstrated in our paper [2]. For
> example, we observed a 36% TCP throughput improvement for an OpenVPN
> connection between Germany and the USA.
>
> Synthetic pktgen benchmarks indicate a slight regression.
> Pktgen benchmarks are provided per commit, with the final commit showing
> the overall performance.
>
> Thanks!

I posted a minor nit on patch 2.

Otherwise LGTM:

Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx>

thanks for the work!


> [1] Link: https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
> [2] Link: https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
> [3] Link: https://lore.kernel.org/r/174549940981.608169.4363875844729313831.stgit@firesoul
> [4] Link: https://lore.kernel.org/r/176295323282.307447.14790015927673763094.stgit@firesoul
>
> ---
> Changelog:
> V8:
> - Drop code changes in drivers/net/tap.c; The code there deals with
> ipvtap/macvtap which are unrelated to the goal of this patch series
> and I did not realize that before
> -> Greatly simplified logic, 4 instead of 9 commits
> -> No more duplicated logics and distinction in vhost required
> - Only wake after the queue stopped and half of the ring was consumed
> as suggested by MST
> -> Performance improvements for TAP, but still slightly slower
> - Better benchmarking with pinned threads, XDP drop program for
> tap+vhost-net and disabling CPU mitigations (and newer Ryzen 5 5600X
> processor) as suggested by Jason Wang
>
> V7: https://lore.kernel.org/netdev/20260107210448.37851-1-simon.schippers@xxxxxxxxxxxxxx/
> - Switch to an approach similar to veth [3] (excluding the recently fixed
> variant [4]), as suggested by MST, with minor adjustments discussed in V6
> - Rename the cover-letter title
> - Add multithreaded pktgen and iperf3 benchmarks, as suggested by Jason
> Wang
> - Rework __ptr_ring_consume_created_space() so it can also be used after
> batched consume
>
> V6: https://lore.kernel.org/netdev/20251120152914.1127975-1-simon.schippers@xxxxxxxxxxxxxx/
> General:
> - Major adjustments to the descriptions. Special thanks to Jon Kohler!
> - Fix git bisect by moving most logic into dedicated functions and only
> start using them in patch 7.
> - Moved the main logic of the coupled producer and consumer into a single
> patch to avoid a chicken-and-egg dependency between commits :-)
> - Rebased to 6.18-rc5 and ran benchmarks again that now also include lost
> packets (previously I missed a 0, so all benchmark results were higher by
> factor 10...).
> - Also include the benchmark in patch 7.
>
> Producer:
> - Move logic into the new helper tun_ring_produce()
> - Added a smp_rmb() paired with the consumer, ensuring freed space of the
> consumer is visible
> - Assume that ptr_ring is not full when __ptr_ring_full_next() is called
>
> Consumer:
> - Use an unpaired smp_rmb() instead of barrier() to ensure that the
> netdev_tx_queue_stopped() call completes before discarding
> - Also wake the netdev queue if it was stopped before discarding and then
> becomes empty
> -> Fixes race with producer as identified by MST in V5
> -> Waking the netdev queues upon resize is not required anymore
> - Use __ptr_ring_consume_created_space() instead of messing with ptr_ring
> internals
> -> Batched consume now just calls
> __tun_ring_consume()/__tap_ring_consume() in a loop
> - Added an smp_wmb() before waking the netdev queue which is paired with
> the smp_rmb() discussed above
>
> V5: https://lore.kernel.org/netdev/20250922221553.47802-1-simon.schippers@xxxxxxxxxxxxxx/T/#u
> - Stop the netdev queue prior to producing the final fitting ptr_ring entry
> -> Ensures the consumer has the latest netdev queue state, making it safe
> to wake the queue
> -> Resolves an issue in vhost-net where the netdev queue could remain
> stopped despite being empty
> -> For TUN/TAP, the netdev queue no longer needs to be woken in the
> blocking loop
> -> Introduces new helpers __ptr_ring_full_next and
> __ptr_ring_will_invalidate for this purpose
> - vhost-net now uses wrappers of TUN/TAP for ptr_ring consumption rather
> than maintaining its own rx_ring pointer
>
> V4: https://lore.kernel.org/netdev/20250902080957.47265-1-simon.schippers@xxxxxxxxxxxxxx/T/#u
> - Target net-next instead of net
> - Changed to patch series instead of single patch
> - Changed to new title from old title
> "TUN/TAP: Improving throughput and latency by avoiding SKB drops"
> - Wake netdev queue with new helpers wake_netdev_queue when there is any
> spare capacity in the ptr_ring instead of waiting for it to be empty
> - Use tun_file instead of tun_struct in tun_ring_recv as a more consistent
> logic
> - Use smp_wmb() and smp_rmb() barrier pair, which avoids any packet drops
> that happened rarely before
> - Use safer logic for vhost-net using RCU read locks to access TUN/TAP data
>
> V3: https://lore.kernel.org/netdev/20250825211832.84901-1-simon.schippers@xxxxxxxxxxxxxx/T/#u
> - Added support for TAP and TAP+vhost-net.
>
> V2: https://lore.kernel.org/netdev/20250811220430.14063-1-simon.schippers@xxxxxxxxxxxxxx/T/#u
> - Removed NETDEV_TX_BUSY return case in tun_net_xmit and removed
> unnecessary netif_tx_wake_queue in tun_ring_recv.
>
> V1: https://lore.kernel.org/netdev/20250808153721.261334-1-simon.schippers@xxxxxxxxxxxxxx/T/#u
> ---
>
> Simon Schippers (4):
> tun/tap: add ptr_ring consume helper with netdev queue wakeup
> vhost-net: wake queue of tun/tap after ptr_ring consume
> ptr_ring: move free-space check into separate helper
> tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present
>
> drivers/net/tun.c | 91 +++++++++++++++++++++++++++++++++++++---
> drivers/vhost/net.c | 15 +++++--
> include/linux/if_tun.h | 3 ++
> include/linux/ptr_ring.h | 14 ++++++-
> 4 files changed, 111 insertions(+), 12 deletions(-)
>
> --
> 2.43.0