Re: amusing SLUB compaction bug when CC_OPTIMIZE_FOR_SIZE

From: Vlastimil Babka
Date: Wed Oct 26 2022 - 06:52:08 EST


On 10/25/22 16:08, Vlastimil Babka wrote:
> On 10/25/22 15:47, Hyeonggon Yoo wrote:
>> On Mon, Oct 24, 2022 at 04:35:04PM +0200, Vlastimil Babka wrote:
>>
>> [,,,]
>>
>>> diff --git a/mm/slab.c b/mm/slab.c
>>> index 59c8e28f7b6a..219beb48588e 100644
>>> --- a/mm/slab.c
>>> +++ b/mm/slab.c
>>> @@ -1370,6 +1370,8 @@ static struct slab *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
>>>
>>> account_slab(slab, cachep->gfporder, cachep, flags);
>>> __folio_set_slab(folio);
>>> + /* Make the flag visible before any changes to folio->mapping */
>>> + smp_wmb();
>>> /* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
>>> if (sk_memalloc_socks() && page_is_pfmemalloc(folio_page(folio, 0)))
>>> slab_set_pfmemalloc(slab);
>>> @@ -1387,9 +1389,11 @@ static void kmem_freepages(struct kmem_cache *cachep, struct slab *slab)
>>>
>>> BUG_ON(!folio_test_slab(folio));
>>> __slab_clear_pfmemalloc(slab);
>>> - __folio_clear_slab(folio);
>>> page_mapcount_reset(folio_page(folio, 0));
>>> folio->mapping = NULL;
>>> + /* Make the mapping reset visible before clearing the flag */
>>> + smp_wmb();
>>> + __folio_clear_slab(folio);
>>>
>>> if (current->reclaim_state)
>>> current->reclaim_state->reclaimed_slab += 1 << order;
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index 157527d7101b..6dc17cb915c5 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -1800,6 +1800,8 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
>>>
>>> slab = folio_slab(folio);
>>> __folio_set_slab(folio);
>>> + /* Make the flag visible before any changes to folio->mapping */
>>> + smp_wmb();
>>> if (page_is_pfmemalloc(folio_page(folio, 0)))
>>> slab_set_pfmemalloc(slab);
>>>
>>> @@ -2008,8 +2010,10 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab)
>>> }
>>>
>>> __slab_clear_pfmemalloc(slab);
>>> - __folio_clear_slab(folio);
>>> folio->mapping = NULL;
>>> + /* Make the mapping reset visible before clearing the flag */
>>> + smp_wmb();
>>> + __folio_clear_slab(folio);
>>> if (current->reclaim_state)
>>> current->reclaim_state->reclaimed_slab += pages;
>>> unaccount_slab(slab, order, s);
>>> --
>>> 2.38.0
>>
>> Do we need to try this with memory barriers before frozen refcount lands in?
>
> There was IIRC an unresolved issue with frozen refcount tripping the page
> isolation code so I didn't want to be depending on that.
>
>> It's quite complicated and IIUC there is a still theoretical race:
>>
>> At isolate_movable_page: At slab alloc: At slab free:
>> folio = alloc_pages(flags, order)
>>
>> folio_try_get()
>> folio_test_slab() == false
>> __folio_set_slab(folio)
>> smp_wmb()
>>
>> call_rcu(&slab->rcu_head, rcu_free_slab);
>>
>>
>> smp_rmb()
>> __folio_test_movable() == true
>>
>> folio->mapping = NULL;
>> smp_wmb()
>> __folio_clear_slab(folio);
>> smp_rmb()
>> folio_test_slab() == false
>>
>> folio_trylock()
>
> There's also between above and below:
>
> if (!PageMovable(page) || PageIsolated(page))
> goto out_no_isolated;
>
> mops = page_movable_ops(page);
>
> If we put another smp_rmb() before the PageMovable test, could that have
> helped? It would assure we observe the folio->mapping = NULL; from the "slab
> free" side?
>
> But yeah, it's getting ridiculous. Maybe there's a simpler way to check two
> bits in two different bytes atomically. Or maybe it's just an impossible
> task, I feel I just dunno computers at this point.

After more thought, I think I just made a mistake by doing two
folio_test_slab() tests around a single __folio_test_movable(). What I was
supposed to do was two __folio_test_movable() tests around a single
folio_test_slab()... I hope. That should take care of your scenario, or do
you see another one? Thanks.

----8----