Re: amusing SLUB compaction bug when CC_OPTIMIZE_FOR_SIZE

From: Vlastimil Babka
Date: Wed Sep 28 2022 - 12:22:23 EST


On 9/28/22 15:48, Joel Fernandes wrote:
On Wed, Sep 28, 2022 at 02:49:02PM +0900, Hyeonggon Yoo wrote:
On Tue, Sep 27, 2022 at 10:16:35PM -0700, Hugh Dickins wrote:
It's a bug in linux-next, but taking me too long to identify which
commit is "to blame", so let me throw it over to you without more
delay: I think __PageMovable() now needs to check !PageSlab().

When I tried that, the result wasn't really nice:

https://lore.kernel.org/all/aec59f53-0e53-1736-5932-25407125d4d4@xxxxxxx/

And what if there's another conflicting page "type" later. Or the debugging variant of rcu_head in struct page itself. The __PageMovable() is just too fragile.

I had made a small experimental change somewhere, rebuilt and rebooted,
was not surprised to crash once swapping and compaction came in,
but was surprised to find the crash in isolate_movable_page(),
called by compaction's isolate_migratepages_block().

page->mapping was ffffffff811303aa, which qualifies as __PageMovable(),
which expects struct movable_operations at page->mapping minus low bits.
But ffffffff811303aa was the address of SLUB's rcu_free_slab(): I have
CONFIG_CC_OPTIMIZE_FOR_SIZE=y, so function addresses may have low bits set.

Over to you! Thanks,
Hugh

Wow, didn't expect this.
Thank you for report!

That should be due to commit 65505d1f2338e7
("mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head")
as now rcu_head can use some bits that shares with mapping.

Hmm IMO we have two choices...

1. simply drop the commit as it's only for debugging (RCU folks may not like [1])

Yeah definitely don't like this option as patches are out that depend on
this (not yet merged though). :-)

But we'll have to do that for now and postpone to 6.2 I'm afraid as merge window for 6.1 is too close to have confidence in any solution that we came up this moment.

2. make __PageMovable() to use true page flag, with approach [2])

What are the drawbacks of making it a true flag?

Even if we free PageSlab, I'm sure there will be better uses of a free page flag than __PageMovable.

3. With frozen page allocation
https://lore.kernel.org/all/20220809171854.3725722-1-willy@xxxxxxxxxxxxx/

slab pages will have refcount 0 and compaction will skip them for that reason. But it had other unresolved issues with page isolation code IIRC.

thanks,

- Joel




[1] https://lore.kernel.org/all/85afd876-d8bb-0804-b2c5-48ed3055e702@xxxxxxxxxxxxxxxxx/
[2] https://lore.kernel.org/linux-mm/20220919125708.276864-1-42.hyeyoo@xxxxxxxxx/

Thanks!

--
Thanks,
Hyeonggon