Re: [PATCH] mm/sparsemem: fix race in accessing memory_section->usage

From: Pavan Kondeti
Date: Mon Oct 16 2023 - 06:34:20 EST


On Fri, Oct 13, 2023 at 06:34:27PM +0530, Charan Teja Kalla wrote:
> The below race is observed on a PFN which falls into the device memory
> region with the system memory configuration where PFN's are such that
> [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL]. Since normal zone start and
> end pfn contains the device memory PFN's as well, the compaction
> triggered will try on the device memory PFN's too though they end up in
> NOP(because pfn_to_online_page() returns NULL for ZONE_DEVICE memory
> sections). When from other core, the section mappings are being removed
> for the ZONE_DEVICE region, that the PFN in question belongs to,
> on which compaction is currently being operated is resulting into the
> kernel crash with CONFIG_SPASEMEM_VMEMAP enabled.
>
> compact_zone() memunmap_pages
> ------------- ---------------
> __pageblock_pfn_to_page
> ......
> (a)pfn_valid():
> valid_section()//return true
> (b)__remove_pages()->
> sparse_remove_section()->
> section_deactivate():
> [Free the array ms->usage and set
> ms->usage = NULL]
> pfn_section_valid()
> [Access ms->usage which
> is NULL]
>
> NOTE: From the above it can be said that the race is reduced to between
> the pfn_valid()/pfn_section_valid() and the section deactivate with
> SPASEMEM_VMEMAP enabled.
>
> The commit b943f045a9af("mm/sparse: fix kernel crash with
> pfn_section_valid check") tried to address the same problem by clearing
> the SECTION_HAS_MEM_MAP with the expectation of valid_section() returns
> false thus ms->usage is not accessed.
>
> Fix this issue by the below steps:
> a) Clear SECTION_HAS_MEM_MAP before freeing the ->usage.
> b) RCU protected read side critical section will either return NULL when
> SECTION_HAS_MEM_MAP is cleared or can successfully access ->usage.
> c) Synchronize the rcu on the write side and free the ->usage. No
> attempt will be made to access ->usage after this as the
> SECTION_HAS_MEM_MAP is cleared thus valid_section() return false.
>
> Since the section_deactivate() is a rare operation and will come in the
> hot remove path, impact of synchronize_rcu() should be negligble.

struct mem_section_usage has other field like pageblock_flags. Do we
need to protect its readers with RCU? Also can we annotate usage field
in struct mem_section with __rcu and use RCU accessors like
rcu_dereference() while using memsection::usage field?

>
> Fixes: f46edbd1b151 ("mm/sparsemem: add helpers track active portions of a section at boot")
> Signed-off-by: Charan Teja Kalla <quic_charante@xxxxxxxxxxx>
> ---
> include/linux/mmzone.h | 11 +++++++++--
> mm/sparse.c | 14 ++++++++------
> 2 files changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 4106fbc..c877396 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1987,6 +1987,7 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> static inline int pfn_valid(unsigned long pfn)
> {
> struct mem_section *ms;
> + int ret;
>
> /*
> * Ensure the upper PAGE_SHIFT bits are clear in the
> @@ -2000,13 +2001,19 @@ static inline int pfn_valid(unsigned long pfn)
> if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> return 0;
> ms = __pfn_to_section(pfn);
> - if (!valid_section(ms))
> + rcu_read_lock();
> + if (!valid_section(ms)) {
> + rcu_read_unlock();
> return 0;
> + }
> /*
> * Traditionally early sections always returned pfn_valid() for
> * the entire section-sized span.
> */
> - return early_section(ms) || pfn_section_valid(ms, pfn);
> + ret = early_section(ms) || pfn_section_valid(ms, pfn);
> + rcu_read_unlock();
> +
> + return ret;
> }
> #endif
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 77d91e5..ca7dbe1 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -792,6 +792,13 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> unsigned long section_nr = pfn_to_section_nr(pfn);
>
> /*
> + * Mark the section invalid so that valid_section()
> + * return false. This prevents code from dereferencing
> + * ms->usage array.
> + */
> + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> +

This trick may not be needed if we add proper NULL checks around ms->usage. We are anyway
introducing a new rule this check needs to be done under RCU lock, so why not revisit it?

> + /*
> * When removing an early section, the usage map is kept (as the
> * usage maps of other sections fall into the same page). It
> * will be re-used when re-adding the section - which is then no
> @@ -799,16 +806,11 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> * was allocated during boot.
> */
> if (!PageReserved(virt_to_page(ms->usage))) {
> + synchronize_rcu();
> kfree(ms->usage);
> ms->usage = NULL;
> }

If we add NULL checks around ms->usage, this becomes

tmp = rcu_replace_pointer(ms->usage, NULL, hotplug_locked());
syncrhonize_rcu();
kfree(tmp);

btw, Do we come here with any global locks? if yes, synchronize_rcu() can add delays in releasing
the lock. In that case we may have to go for async RCU free.

> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> - /*
> - * Mark the section invalid so that valid_section()
> - * return false. This prevents code from dereferencing
> - * ms->usage array.
> - */
> - ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> }
>
> /*
>

Thanks,
Pavan