Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations

From: Michal Hocko
Date: Fri Feb 15 2019 - 08:11:28 EST


On Fri 15-02-19 15:08:36, Huang, Ying wrote:
> Michal Hocko <mhocko@xxxxxxxxxx> writes:
>
> > On Mon 11-02-19 16:38:46, Huang, Ying wrote:
> >> From: Huang Ying <ying.huang@xxxxxxxxx>
> >>
> >> When swapin is performed, after getting the swap entry information from
> >> the page table, system will swap in the swap entry, without any lock held
> >> to prevent the swap device from being swapoff. This may cause the race
> >> like below,
> >>
> >> CPU 1 CPU 2
> >> ----- -----
> >> do_swap_page
> >> swapin_readahead
> >> __read_swap_cache_async
> >> swapoff swapcache_prepare
> >> p->swap_map = NULL __swap_duplicate
> >> p->swap_map[?] /* !!! NULL pointer access */
> >>
> >> Because swapoff is usually done when system shutdown only, the race may
> >> not hit many people in practice. But it is still a race need to be fixed.
> >>
> >> To fix the race, get_swap_device() is added to check whether the specified
> >> swap entry is valid in its swap device. If so, it will keep the swap
> >> entry valid via preventing the swap device from being swapoff, until
> >> put_swap_device() is called.
> >>
> >> Because swapoff() is very rare code path, to make the normal path runs as
> >> fast as possible, disabling preemption + stop_machine() instead of
> >> reference count is used to implement get/put_swap_device(). From
> >> get_swap_device() to put_swap_device(), the preemption is disabled, so
> >> stop_machine() in swapoff() will wait until put_swap_device() is called.
> >>
> >> In addition to swap_map, cluster_info, etc. data structure in the struct
> >> swap_info_struct, the swap cache radix tree will be freed after swapoff,
> >> so this patch fixes the race between swap cache looking up and swapoff
> >> too.
> >>
> >> Races between some other swap cache usages protected via disabling
> >> preemption and swapoff are fixed too via calling stop_machine() between
> >> clearing PageSwapCache() and freeing swap cache data structure.
> >>
> >> Alternative implementation could be replacing disable preemption with
> >> rcu_read_lock_sched and stop_machine() with synchronize_sched().
> >
> > using stop_machine is generally discouraged. It is a gross
> > synchronization.
> >
> > Besides that, since when do we have this problem?
>
> For problem, you mean the race between swapoff and the page fault
> handler?

yes

> The problem is introduced in v4.11 when we avoid to replace
> swap_info_struct->lock with swap_cluster_info->lock in
> __swap_duplicate() if possible to improve the scalability of swap
> operations. But because swapoff is a really rare operation, I don't
> think it's necessary to backport the fix.

Well, a lack of any bug reports would support your theory that this is
unlikely to hit in practice. Fixes tag would be nice to have regardless
though.

Thanks!
--
Michal Hocko
SUSE Labs