Re: [RFC 0/3] Improve memory statistics for virtio balloon
From: David Hildenbrand
Date: Mon Apr 15 2024 - 11:02:11 EST
On 15.04.24 10:41, zhenwei pi wrote:
Hi,
When the guest runs under critial memory pressure, the guest becomss
too slow, even sshd turns D state(uninterruptible) on memory
allocation. We can't login this VM to do any work on trouble shooting.
Guest kernel log via virtual TTY(on host side) only provides a few
necessary log after OOM. More detail memory statistics are required,
then we can know explicit memory events and estimate the pressure.
I'm going to introduce several VM counters for virtio balloon:
- oom-kill
- alloc-stall
- scan-async
- scan-direct
- reclaim-async
- reclaim-direct
IIUC, we're only exposing events that are already getting provided via
all_vm_events(), correct?
In that case, I don't really see a major issue. Some considerations:
(1) These new events are fairly Linux specific.
PSWPIN and friends are fairly generic, but HGTLB is also already fairly
Linux specific already. OOM-kills don't really exist on Windows, for
example. We'll have to be careful of properly describing what the
semantics are.
(2) How should we handle if Linux ever stops supporting a certain event
(e.g., major reclaim rework). I assume, simply return nothing like we
currently would for VIRTIO_BALLOON_S_HTLB_PGALLOC without
CONFIG_HUGETLB_PAGE.
--
Cheers,
David / dhildenb