Re: [PATCH 2/2] mm: zswap: remove unnecessary tree cleanups in zswap_swapoff()

From: Chengming Zhou
Date: Thu Jan 25 2024 - 04:39:41 EST


On 2024/1/25 17:26, Yosry Ahmed wrote:
> On Thu, Jan 25, 2024 at 1:22 AM Chengming Zhou
> <zhouchengming@xxxxxxxxxxxxx> wrote:
>>
>> On 2024/1/25 17:03, Yosry Ahmed wrote:
>>>>>>>> The second difference is the handling of lru entry, which is easy that we
>>>>>>>> just zswap_lru_del() in tree lock.
>>>>>>>
>>>>>>> Why do we need zswap_lru_del() at all? We should have already isolated
>>>>>>> the entry at that point IIUC.
>>>>>>
>>>>>> I was thinking how to handle the "zswap_lru_putback()" if not writeback,
>>>>>> in which case we can't use the entry actually since we haven't got reference
>>>>>> of it. So we can don't isolate at the entry, and only zswap_lru_del() when
>>>>>> we are going to writeback actually.
>>>>>
>>>>> Why not just call zswap_lru_putback() before we unlock the folio?
>>>>
>>>> When early return because __read_swap_cache_async() return NULL or !folio_was_allocated,
>>>> we don't have a locked folio yet. The entry maybe invalidated and freed concurrently.
>>>
>>> Oh, that path, right.
>>>
>>> If we don't isolate the entry straightaway, concurrent reclaimers will
>>> see the same entry, call __read_swap_cache_async(), find the folio
>>> already in the swapcache and stop shrinking. This is because usually
>>> this means we are racing with swapin and hitting the warmer part of
>>> the zswap LRU.
>>>
>>> I am not sure if this would matter in practice, maybe Nhat knows
>>> better. Perhaps we can rotate the entry in the LRU before calling
>>> __read_swap_cache_async() to minimize the chances of such a race? Or
>>> we can serialize the calls to __read_swap_cache_async() but this may
>>> be an overkill.
>>
>> Also, not sure, rotate the entry maybe good IMHO since we will zswap_lru_del()
>> once we checked the invalidate race.
>
> Not sure what you mean. If we rotate first, we won't have anything to
> do in the failure case (if the folio is not locked). We will have to
> do zswap_lru_del() if actually writeback, yes, but in this case no
> synchronization is needed because the folio is locked, right?

Right, sorry for my confusing expression. We rotate first in lru lock,
and only zswap_lru_del() later if actually writeback.

What I want to mean is that the possibility of seeing the entry on lru list
by another reclaimer is very low, since we rotate and the timing between
__read_swap_cache_async() and zswap_lru_del() should be short.