Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference

From: Huang, Ying
Date: Wed Sep 23 2020 - 23:51:30 EST

Rafael Aquini <aquini@xxxxxxxxxx> writes:
> The bug here is quite simple: split_swap_cluster() misses checking for
> lock_cluster() returning NULL before committing to change cluster_info->flags.

I don't think so. We shouldn't run into this situation firstly. So the
"fix" hides the real bug instead of fixing it. Just like we call
VM_BUG_ON_PAGE(!PageLocked(head), head) in split_huge_page_to_list()
instead of returning if !PageLocked(head) silently.

> The fundamental problem has nothing to do with allocating, or not allocating
> a swap cluster, but it has to do with the fact that the THP deferred split scan
> can transiently race with swapcache insertion, and the fact that when you run
> your swap area on rotational storage cluster_info is _always_ NULL.
> split_swap_cluster() needs to check for lock_cluster() returning NULL because
> that's one possible case, and it clearly fails to do so.

If there's a race, we should fix the race. But the code path for
swapcache insertion is,

get_swap_page() /* Return if fails to allocate */

While the code path to split THP is,

if PageSwapCache()

Both code paths are protected by the page lock. So there should be some
other reasons to trigger the bug.

And again, for HDD, a THP shouldn't have PageSwapCache() set at the
first place. If so, the bug is that the flag is set and we should fix
the setting.

> Run a workload that cause multiple THP COW, and add a memory hogger to create
> memory pressure so you'll force the reclaimers to kick the registered
> shrinkers. The trigger is not heavy swapping, and that's probably why
> most swap test cases don't hit it. The window is tight, but you will get the
> NULL pointer dereference.

Do you have a script to reproduce the bug?

> Regardless you find furhter bugs, or not, this patch is needed to correct a
> blunt coding mistake.

As above. I don't agree with that.

Best Regards,
Huang, Ying