Re: [RFC PATCH 2/2] virtio-balloon: add stats push mode
From: David Hildenbrand (Arm)
Date: Tue Jun 16 2026 - 08:33:57 EST
On 5/13/26 18:50, Gregory Price wrote:
> When doing aggressive overcommit of VMs on a single host, a pull
> model of stat retrieval is problematic if a guest becomes some form
> of unresponsive. In particular, it's difficult to discern the
> difference between a hung guest and a slow guest - and why the
> guest is experiencing that.
>
> Add VIRTIO_BALLOON_F_STATS_PUSH feature that allows the host to
> configure the guest to push stats on a timer instead of the default
> pull model.
>
> The host sets stats_push_interval_ms in the balloon config space:
> 0 = disabled (pull-only, default)
> N > 0 = guest pushes stats every N milliseconds
>
> The push mode reuses the existing stats VQ, same buffer format,
> same tags. The host can change the interval at runtime by updating
> the config field.
>
> Push mode provides two advantages over pull:
> 1. Guest liveness detection: in pull mode, the host cannot
> distinguish a slow guest from a hung guest without implementing
> its own timeout tracking. In push mode, the absence of expected
> stats buffers is an implicit liveness signal; if the guest
> fails to push within the expected interval, the host can
> conclude it is unresponsive.
> 2. Latency-sensitive consumers (e.g., memory pressure response
> loops) receive fresh stats at a guaranteed cadence without
> the host needing to poll.
>
> STATS_PUSH requires STATS_VQ; the driver clears STATS_PUSH during
> feature validation if STATS_VQ is absent. When push mode is active,
> the pull callback is suppressed to avoid racing on buffer submission.
>
> The pull model remains available and is the default.
I don't quite see the big benefit here, really: either it's a timer in the
hypervisor or a timer in the VM. A slow VM will, in either model, delay the
update of stats.
If you need some "liveness detection", is virtio-balloon stats updates really
the right mechanism?
I don't quite understand the "Latency-sensitive consumers" problem. If the VM is
slow, it is slow and will mess with latency-sensitive consumers in either way?
--
Cheers,
David