Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference
From: Huang, Ying
Date: Wed Sep 23 2020 - 23:51:30 EST
Rafael Aquini <aquini@xxxxxxxxxx> writes:
> The bug here is quite simple: split_swap_cluster() misses checking for
> lock_cluster() returning NULL before committing to change cluster_info->flags.
I don't think so. We shouldn't run into this situation firstly. So the
"fix" hides the real bug instead of fixing it. Just like we call
VM_BUG_ON_PAGE(!PageLocked(head), head) in split_huge_page_to_list()
instead of returning if !PageLocked(head) silently.
> The fundamental problem has nothing to do with allocating, or not allocating
> a swap cluster, but it has to do with the fact that the THP deferred split scan
> can transiently race with swapcache insertion, and the fact that when you run
> your swap area on rotational storage cluster_info is _always_ NULL.
> split_swap_cluster() needs to check for lock_cluster() returning NULL because
> that's one possible case, and it clearly fails to do so.
If there's a race, we should fix the race. But the code path for
swapcache insertion is,
add_to_swap()
get_swap_page() /* Return if fails to allocate */
add_to_swap_cache()
SetPageSwapCache()
While the code path to split THP is,
split_huge_page_to_list()
if PageSwapCache()
split_swap_cluster()
Both code paths are protected by the page lock. So there should be some
other reasons to trigger the bug.
And again, for HDD, a THP shouldn't have PageSwapCache() set at the
first place. If so, the bug is that the flag is set and we should fix
the setting.
> Run a workload that cause multiple THP COW, and add a memory hogger to create
> memory pressure so you'll force the reclaimers to kick the registered
> shrinkers. The trigger is not heavy swapping, and that's probably why
> most swap test cases don't hit it. The window is tight, but you will get the
> NULL pointer dereference.
Do you have a script to reproduce the bug?
> Regardless you find furhter bugs, or not, this patch is needed to correct a
> blunt coding mistake.
As above. I don't agree with that.
Best Regards,
Huang, Ying