Re: [PATCH V3] mm: fix use-after free of page_ext after race with memory-offline
From: Charan Teja Kalla
Date: Tue Aug 16 2022 - 06:40:59 EST
Thanks Michal!!
On 8/15/2022 8:36 PM, Michal Hocko wrote:
> On Tue 09-08-22 20:16:43, Charan Teja Kalla wrote:
> [...]
>> diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
>> index fabb2e1..0e259da 100644
>> --- a/include/linux/page_ext.h
>> +++ b/include/linux/page_ext.h
> [...]
>> @@ -87,5 +83,14 @@ static inline void page_ext_init_flatmem_late(void)
>> static inline void page_ext_init_flatmem(void)
>> {
>> }
>> +
>> +static inline struct page *page_ext_get(struct page *page)
> struct page_ext *
>
oops!! It didn't get caught as this is in !CONFIG_PAGE_EXTENSION.
>> +{
>> + return NULL;
>> +}
>> +
>> +static inline void page_ext_put(void)
>> +{
>> +}
>> #endif /* CONFIG_PAGE_EXTENSION */
>> #endif /* __LINUX_PAGE_EXT_H */
> [...]
>> diff --git a/mm/page_ext.c b/mm/page_ext.c
>> index 3dc715d..91d7bd2 100644
>> --- a/mm/page_ext.c
>> +++ b/mm/page_ext.c
>> @@ -9,6 +9,7 @@
>> #include <linux/page_owner.h>
>> #include <linux/page_idle.h>
>> #include <linux/page_table_check.h>
>> +#include <linux/rcupdate.h>
>>
>> /*
>> * struct page extension
>> @@ -59,6 +60,10 @@
>> * can utilize this callback to initialize the state of it correctly.
>> */
>>
>> +#ifdef CONFIG_SPARSEMEM
>> +#define PAGE_EXT_INVALID (0x1)
>> +#endif
>> +
>> #if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT)
>> static bool need_page_idle(void)
>> {
>> @@ -84,6 +89,7 @@ static struct page_ext_operations *page_ext_ops[] __initdata = {
>> unsigned long page_ext_size = sizeof(struct page_ext);
>>
>> static unsigned long total_usage;
>> +static struct page_ext *lookup_page_ext(const struct page *page);
>>
>> static bool __init invoke_need_callbacks(void)
>> {
>> @@ -125,6 +131,37 @@ static inline struct page_ext *get_entry(void *base, unsigned long index)
>> return base + page_ext_size * index;
>> }
>>
>> +/*
>> + * This function gives proper page_ext of a memory section
>> + * during race with the offline operation on a memory block
>> + * this section falls into. Not using this function to get
>> + * page_ext of a page, in code paths where extra refcount
>> + * is not taken on that page eg: pfn walking, can lead to
>> + * use-after-free access of page_ext.
> I do not think this is really useful comment, it goes into way too much
> detail about memory hotplug yet not enough to actually understand the
> interaction because there are no references to the actual
> synchronization scheme. I would go with something like:
>
> /*
> * Get a page_ext associated with the given page. Returns NULL if
> * no such page_ext exists otherwise ensures that the page_ext will
> * stay alive until page_ext_put is called.
> * This implies a non-sleeping context.
> */
Will update as per the Matthew input @
https://lore.kernel.org/all/YvplthTjM8Ez5DIq@xxxxxxxxxxxxxxxxxxxx/
>> + */
>> +struct page_ext *page_ext_get(struct page *page)
>> +{
>> + struct page_ext *page_ext;
>> +
>> + rcu_read_lock();
>> + page_ext = lookup_page_ext(page);
>> + if (!page_ext) {
>> + rcu_read_unlock();
>> + return NULL;
>> + }
>> +
>> + return page_ext;
>> +}
>> +
>> +/*
>> + * Must be called after work is done with the page_ext received
>> + * with page_ext_get().
>> + */
>> +
>> +void page_ext_put(void)
>> +{
>> + rcu_read_unlock();
>> +}
> Thinking about this some more I am not sure this is a good interface. It
> doesn't have any reference to the actual object this is called for. This
> is nicely visible in __folio_copy_owner which just calles page_ext_put()
> twice because there are 2 page_exts and I can already see how somebody
> might get confused this is just an error and send a patch to drop one of
> them.
>
> I do understand why you went this way because having a parameter which
> is not used will likely lead to the same situation. On the other hand it
> could be annotated to not raise warnings. One potential way to
> workaround that would be
>
> void page_ext_put(struct page_ext *page_ext)
> {
> if (unlikely(!page_ext))
> return;
>
> rcu_read_unlock();
> }
>
> which would help to make the api slightly more robust in case somebody
> does page_ext_put in a branch where page_ext_get returns NULL.
>
Looks better. Will change this accordingly.
> No strong opinion on that though. WDYI?
>
>> #ifndef CONFIG_SPARSEMEM
>>
>>
> [...]
>> @@ -183,19 +184,26 @@ static inline void __set_page_owner_handle(struct page_ext *page_ext,
>> noinline void __set_page_owner(struct page *page, unsigned short order,
>> gfp_t gfp_mask)
>> {
>> - struct page_ext *page_ext = lookup_page_ext(page);
>> + struct page_ext *page_ext = page_ext_get(page);
>> depot_stack_handle_t handle;
>>
>> if (unlikely(!page_ext))
>> return;
> Either add a comment like this
> /* save_stack can sleep in general so we have to page_ext_put */
Vlastimil suggested to go for save stack first since !page_ext is mostly
unlikely. Snip from his comments:
Why not simply do the save_stack() first and then page_ext_get() just
once? It should be really rare that it's NULL, so I don't think we save
much by avoiding an unnecessary save_stack(), while the overhead of
doing two get/put instead of one will affect every call.
https://lore.kernel.org/all/f5fd4942-b03e-1d1c-213b-9cd5283ced91@xxxxxxx/
>> + page_ext_put();
>>
>> handle = save_stack(gfp_mask);
> or just drop the initial page_ext_get altogether. This function is
> called only when page_ext is supposed to be initialized and !page_ext
> case above should be very unlikely. Or is there any reason to keep this?
>
>> +
>> + /* Ensure page_ext is valid after page_ext_put() above */
>> + page_ext = page_ext_get(page);
>> + if (unlikely(!page_ext))
>> + return;
>> __set_page_owner_handle(page_ext, handle, order, gfp_mask);
>> + page_ext_put();
>> }
>>
> [...]
>> @@ -558,13 +590,17 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>> */
>> handle = READ_ONCE(page_owner->handle);
>> if (!handle)
>> - continue;
>> + goto loop;
>>
>> /* Record the next PFN to read in the file offset */
>> *ppos = (pfn - min_low_pfn) + 1;
>>
>> + memcpy(&page_owner_tmp, page_owner, sizeof(struct page_owner));
>> + page_ext_put();
> why not
> page_owner_tmp = *page_owner;
Done!!
>
>> return print_page_owner(buf, count, pfn, page,
>> - page_owner, handle);
>> + &page_owner_tmp, handle);
>> +loop:
>> + page_ext_put();
>> }
>>
>> return 0;