Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff

From: Huang, Ying
Date: Sun Apr 11 2021 - 21:49:41 EST


Miaohe Lin <linmiaohe@xxxxxxxxxx> writes:

> On 2021/4/10 1:17, Tim Chen wrote:
>>
>>
>> On 4/9/21 1:42 AM, Miaohe Lin wrote:
>>> On 2021/4/9 5:34, Tim Chen wrote:
>>>>
>>>>
>>>> On 4/8/21 6:08 AM, Miaohe Lin wrote:
>>>>> When I was investigating the swap code, I found the below possible race
>>>>> window:
>>>>>
>>>>> CPU 1 CPU 2
>>>>> ----- -----
>>>>> do_swap_page
>>>>> synchronous swap_readpage
>>>>> alloc_page_vma
>>>>> swapoff
>>>>> release swap_file, bdev, or ...
>>>>
>>>
>>> Many thanks for quick review and reply!
>>>
>>>> Perhaps I'm missing something. The release of swap_file, bdev etc
>>>> happens after we have cleared the SWP_VALID bit in si->flags in destroy_swap_extents
>>>> if I read the swapoff code correctly.
>>> Agree. Let's look this more close:
>>> CPU1 CPU2
>>> ----- -----
>>> swap_readpage
>>> if (data_race(sis->flags & SWP_FS_OPS)) {
>>> swapoff
>>> p->swap_file = NULL;
>>> struct file *swap_file = sis->swap_file;
>>> struct address_space *mapping = swap_file->f_mapping;[oops!]
>>> ...
>>> p->flags = 0;
>>> ...
>>>
>>> Does this make sense for you?
>>
>> p->swapfile = NULL happens after the
>> p->flags &= ~SWP_VALID, synchronize_rcu(), destroy_swap_extents() sequence in swapoff().
>>
>> So I don't think the sequence you illustrated on CPU2 is in the right order.
>> That said, without get_swap_device/put_swap_device in swap_readpage, you could
>> potentially blow pass synchronize_rcu() on CPU2 and causes a problem. so I think
>> the problematic race looks something like the following:
>>
>>
>> CPU1 CPU2
>> ----- -----
>> swap_readpage
>> if (data_race(sis->flags & SWP_FS_OPS)) {
>> swapoff
>> p->flags = &= ~SWP_VALID;
>> ..
>> synchronize_rcu();
>> ..
>> p->swap_file = NULL;
>> struct file *swap_file = sis->swap_file;
>> struct address_space *mapping = swap_file->f_mapping;[oops!]
>> ...
>> ...
>>
>
> Agree. This is also what I meant to illustrate. And you provide a better one. Many thanks!

For the pages that are swapped in through swap cache. That isn't an
issue. Because the page is locked, the swap entry will be marked with
SWAP_HAS_CACHE, so swapoff() cannot proceed until the page has been
unlocked.

So the race is for the fast path as follows,

if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
__swap_count(entry) == 1)

I found it in your original patch description. But please make it more
explicit to reduce the potential confusing.

Best Regards,
Huang, Ying