Re: [PATCH 0/7] riscv: Memory Hot(Un)Plug support

From: David Hildenbrand
Date: Mon May 22 2023 - 04:22:29 EST


On 21.05.23 11:15, Björn Töpel wrote:
Hi David and Anshuman!

Björn Töpel <bjorn@xxxxxxxxxx> writes:

David Hildenbrand <david@xxxxxxxxxx> writes:

On 12.05.23 16:57, Björn Töpel wrote:
From: Björn Töpel <bjorn@xxxxxxxxxxxx>

Memory Hot(Un)Plug support for the RISC-V port
==============================================

[...]


Cool stuff! I'm fairly busy right now, so some high-level questions upfront:

No worries, and no rush! I'd say the v1 series was mainly for the RISC-V
folks, and I've got tons of (offline) comments from Alex -- and with
your comments below some more details to figure out.

One of the major issues with my v1 patch is around init_mm page table
synchronization, and that'll be part of the v2.

I've noticed there's a quite a difference between x86-64 and arm64 in
terms of locking, when updating (add/remove) the init_mm table. x86-64
uses the usual page table locking mechanisms (used by the generic
kernel functions), whereas arm64 does not.

How does arm64 manage to mix the "lock-less" updates (READ/WRITE_ONCE,
and fences in set_p?d+friends), with the generic kernel ones that uses
the regular page locking mechanism?

I'm obviously missing something about the locking rules for memory hot
add/remove... I've been reading the arm64 memory hot add/remove
series, but none the wiser! ;-)

In general, memory hot(un)plug is serialized on a high level using the mem_hotplug_lock. For example, in pagemap_range() or in add_memory_resource(), we grab that lock in write mode. So we'll never see memory getting added/removed concurrently from the direct map.

From what I recall, the locking on the arch level is required for concurrent (direct mapping) page table modifications that target virtual address ranges adjacent to the ranges we hot(un)plug:
CONFIG_ARCH_HAS_SET_DIRECT_MAP and vmalloc come to mind.

For example, if a range would be mapped using a large PUD, but we have to unplug it partially (unplugging memory part of bootmem), we'd have to replace the large PUD by a PMD table first. That change (that could affect other concurrent page table walkers/operations) has to be synchronized.

I guess to which degree this applies to riscv depends on the virtual memory layout, direct mapping granularity and features (e.g., CONFIG_ARCH_HAS_SET_DIRECT_MAP).


One trick that arm64 implements is, that it only allows hotunplugging memory that was hotplugged (see prevent_bootmem_remove_notifier()). That might just rule out such problematic cases that require locking completely, and the high-level mem_hotplug_lock sufficient.

--
Thanks,

David / dhildenb