Re: [PATCH V3 2/3] arm64/mm/hotplug: Enable MEM_OFFLINE event handling

From: Gavin Shan
Date: Wed Sep 23 2020 - 02:31:50 EST


Hi Anshuman,

On 9/21/20 10:05 PM, Anshuman Khandual wrote:
This enables MEM_OFFLINE memory event handling. It will help intercept any
possible error condition such as if boot memory some how still got offlined
even after an explicit notifier failure, potentially by a future change in
generic hot plug framework. This would help detect such scenarios and help
debug further.

Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: Mark Rutland <mark.rutland@xxxxxxx>
Cc: Marc Zyngier <maz@xxxxxxxxxx>
Cc: Steve Capper <steve.capper@xxxxxxx>
Cc: Mark Brown <broonie@xxxxxxxxxx>
Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
---

I'm not sure if it makes sense since MEM_OFFLINE won't be triggered
after NOTIFY_BAD is returned from MEM_GOING_OFFLINE. NOTIFY_BAD means
the whole offline process is stopped. It would be guranteed by generic
framework from syntax standpoint.

However, this looks good if MEM_OFFLINE is triggered without calling
into MEM_GOING_OFFLINE previously, but it would be a bug from generic
framework.

arch/arm64/mm/mmu.c | 37 ++++++++++++++++++++++++++++++++-----
1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index df3b7415b128..6b171bd88bcf 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1482,13 +1482,40 @@ static int prevent_bootmem_remove_notifier(struct notifier_block *nb,
unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
unsigned long pfn = arg->start_pfn;
- if (action != MEM_GOING_OFFLINE)
+ if ((action != MEM_GOING_OFFLINE) && (action != MEM_OFFLINE))
return NOTIFY_OK;
- for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
- ms = __pfn_to_section(pfn);
- if (early_section(ms))
- return NOTIFY_BAD;
+ if (action == MEM_GOING_OFFLINE) {
+ for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+ ms = __pfn_to_section(pfn);
+ if (early_section(ms)) {
+ pr_warn("Boot memory offlining attempted\n");
+ return NOTIFY_BAD;
+ }
+ }
+ } else if (action == MEM_OFFLINE) {
+ for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+ ms = __pfn_to_section(pfn);
+ if (early_section(ms)) {
+
+ /*
+ * This should have never happened. Boot memory
+ * offlining should have been prevented by this
+ * very notifier. Probably some memory removal
+ * procedure might have changed which would then
+ * require further debug.
+ */
+ pr_err("Boot memory offlined\n");
+
+ /*
+ * Core memory hotplug does not process a return
+ * code from the notifier for MEM_OFFLINE event.
+ * Error condition has been reported. Report as
+ * ignored.
+ */
+ return NOTIFY_DONE;
+ }
+ }
}
return NOTIFY_OK;
}


It's pretty much irrelevant comment if the patch doesn't make sense:
the logical block for MEM_GOING_OFFLINE would be reused by MEM_OFFLINE
as they looks similar except the return value and error message :)

Cheers,
Gavin