Re: [PATCH v2 2/5] mm: zswap: calculate limits only when updated

From: David Hildenbrand
Date: Fri Apr 12 2024 - 15:49:02 EST


On 10.04.24 02:52, Yosry Ahmed wrote:
[..]
Do we need a separate notifier chain for totalram_pages() updates?

Good question. I actually might have the requirement to notify some arch
code (s390x) from virtio-mem when fake adding/removing memory, and
already wondered how to best wire that up.

Maybe we can squeeze that into the existing notifier chain, but needs a
bit of thought.


Sorry for the late reply, I had to think about this a bit.

Do you mean by adding new actions (e.g. MEM_FAKE_ONLINE,
MEM_FAKE_OFFLINE), or by reusing the existing actions (MEM_ONLINE,
MEM_OFFLINE, etc).

At least for virtio-mem, I think we could have a MEM_ONLINE/MEM_OFFLINE that prepare the whole range belonging to the Linux memory block (/sys/devices/system/memory/memory...) to go online, and then have something like MEM_SOFT_ONLINE/MEM_SOFT_OFFLINE or ENABLE_PAGES/DISABLE_PAGES ... notifications when parts become usable (!PageOffline, handed to the buddy) or unusable (PageOffline, removed from the buddy).

There are some details to be figured out, but it could work.

And as virtio-mem currently operates in pageblock granularity (e.g., 2 MiB), but frequently handles multiple contiguous pageblocks within a Linux memory block, it's not that bad.


But the issue I see with ballooning is that we operate here often on page granularity. While we could optimize some cases, we might get quite some overhead from all the notifications. Alternatively, we could send a list of pages, but it won't win a beauty contest.

I think the main issue is that, for my purpose (virtio-mem on s390x), I need to notify about the exact memory ranges (so I can reinitialize stuff in s390x code when memory gets effectively re-enabled). For other cases (total pages changing), we don't need the memory ranges, but only the "summary" -- or a notification afterwards that the total pages were just changed quite a bit.


New actions mean minimal impact to existing notifiers, but it may make
more sense to reuse MEM_ONLINE and MEM_OFFLINE to have generic actions
that mean "memory increased" and "memory decreased".

Likely, we should keep their semantics unchanged. Things like KASAN want to allocate metadata memory for the whole range, not on some smallish pieces. It really means "This Linux memory block goes online/offline, please prepare for that.". And again, memory ballooning with small pages is a bit problematic.


I suppose we can add new actions and then separately (and probably
incrementally) audit existing notifiers to check if they want to
handle the new actions as well.

Another consideration is that apparently some ballooning drivers also
register notifiers, so we need to make sure there is no possibility of
deadlock/recursion.

Right.

--
Cheers,

David / dhildenb