Re: [PATCH -V2 2/5] swap, __read_swap_cache_async(): enlarge get/put_swap_device protection range

From: Huang, Ying
Date: Mon May 22 2023 - 20:53:13 EST


David Hildenbrand <david@xxxxxxxxxx> writes:

> On 22.05.23 09:09, Huang Ying wrote:
>> This makes the function a little easier to be understood because we
>> don't need to consider swapoff. And this makes it possible to remove
>> get/put_swap_device() calling in some functions called by
>> __read_swap_cache_async().
>> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
>> Cc: David Hildenbrand <david@xxxxxxxxxx>
>> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
>> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
>> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
>> Cc: Michal Hocko <mhocko@xxxxxxxx>
>> Cc: Minchan Kim <minchan@xxxxxxxxxx>
>> Cc: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
>> Cc: Yang Shi <shy828301@xxxxxxxxx>
>> Cc: Yu Zhao <yuzhao@xxxxxxxxxx>
>> ---
>> mm/swap_state.c | 33 ++++++++++++++++++++++-----------
>> 1 file changed, 22 insertions(+), 11 deletions(-)
>> diff --git a/mm/swap_state.c b/mm/swap_state.c
>> index b76a65ac28b3..a1028fe7214e 100644
>> --- a/mm/swap_state.c
>> +++ b/mm/swap_state.c
>> @@ -417,9 +417,13 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>> {
>> struct swap_info_struct *si;
>> struct folio *folio;
>> + struct page *page;
>> void *shadow = NULL;
>> *new_page_allocated = false;
>> + si = get_swap_device(entry);
>> + if (!si)
>> + return NULL;
>> for (;;) {
>> int err;
>> @@ -428,14 +432,12 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>> * called after swap_cache_get_folio() failed, re-calling
>> * that would confuse statistics.
>> */
>> - si = get_swap_device(entry);
>> - if (!si)
>> - return NULL;
>> folio = filemap_get_folio(swap_address_space(entry),
>> swp_offset(entry));
>> - put_swap_device(si);
>> - if (!IS_ERR(folio))
>> - return folio_file_page(folio, swp_offset(entry));
>> + if (!IS_ERR(folio)) {
>> + page = folio_file_page(folio, swp_offset(entry));
>> + goto got_page;
>> + }
>> /*
>> * Just skip read ahead for unused swap slot.
>> @@ -445,8 +447,8 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>> * as SWAP_HAS_CACHE. That's done in later part of code or
>> * else swap_off will be aborted if we return NULL.
>> */
>> - if (!__swp_swapcount(entry) && swap_slot_cache_enabled)
>> - return NULL;
>> + if (!swap_swapcount(si, entry) && swap_slot_cache_enabled)
>> + goto fail;
>> /*
>> * Get a new page to read into from swap. Allocate it now,
>> @@ -455,7 +457,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>> */
>> folio = vma_alloc_folio(gfp_mask, 0, vma, addr, false);
>> if (!folio)
>> - return NULL;
>> + goto fail;
>> /*
>> * Swap entry may have been freed since our caller observed it.
>> @@ -466,7 +468,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>> folio_put(folio);
>> if (err != -EEXIST)
>> - return NULL;
>> + goto fail;
>> /*
>> * We might race against __delete_from_swap_cache(), and
>> @@ -500,12 +502,17 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>> /* Caller will initiate read into locked folio */
>> folio_add_lru(folio);
>> *new_page_allocated = true;
>> - return &folio->page;
>> + page = &folio->page;
>> +got_page:
>> + put_swap_device(si);
>> + return page;
>> fail_unlock:
>> put_swap_folio(folio, entry);
>> folio_unlock(folio);
>> folio_put(folio);
>> +fail:
>
> Maybe better "fail_put_swap".
>
> We now hold the swap device ref longer than we used to, prevent
> swapoff over the whole operation. I guess there is no way we can
> deadlock this way?

I think that we are safe. In swapoff() syscall, we call
percpu_ref_kill() after all pages are swapped in (via try_to_unuse()).

> In general, looks good to me.

Thanks!

Best Regards,
Huang, Ying