Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag in zswap_writeback_entry

From: Zhongkun He
Date: Tue Jan 16 2024 - 08:40:25 EST


> > > >
> > > > Unless some page flag/readahead expert can confirm that the first
> > > > option is safe, my vote is on this option. I mean, it's fairly minimal
> > > > codewise, no? Just a bunch of plumbing. We can also keep the other
> > > > call sites intact if we just rename the old versions - something along
> > > > the line of:
> > > >
> > > > __read_swap_cache_async_head(..., bool add_to_lru_head)
> > > > {
> > > > ...
> > > > if (add_to_lru_head)
> > > > folio_add_lru(folio)
> > > > else
> > > > folio_add_lru_tail(folio);
> > > > }
> > > >
> > > > __read_swap_cache_async(...)
> > > > {
> > > > return __read_swap_cache_async_tail(..., true);
> > > > }
> > > >
> > > > A bit boilerplate? Sure. But this seems safer, and I doubt it's *that*
> > > > much more work.
> > > >
> > >
> > > Yes, agree. I will try it again.
> >
> > Look forward to seeing it! Thanks for your patience and for working on this.

Please forgive me for adding additional information about this patch.

I have finished the opt for introducing a folio_add_lru_tail(), but
there are many
questions:
1) A new page can be move to LRU only by lru_add_fn, so
folio_add_lru_tail could not add pages to LRU for the following code
in folio_batch_move_lru(),which is added by Alex Shi for
serializing memcg changes in pagevec_lru_move_fn[1].

/* block memcg migration while the folio moves between lru */
if (move_fn != lru_add_fn && !folio_test_clear_lru(folio))
continue;
To achieve the goal, we need to add a new function like lru_add_fn
which does not have the lru flag and folio_add_lru_tail()
+ if (move_fn != lru_add_fn && move_fn != lru_move_tail_fn_new &&
+ !folio_test_clear_lru(folio))

2) __read_swap_cache_async has six parameters, so there is no space to
add a new one, add_to_lru_head.

So it seems a bit hacky just for a special case for the reasons above.

Back to the beginning, lru_add_drain() is the simplest option,which is common
below the __read_swap_cache_async(). Please see the function
swap_cluster_readahead()
and swap_vma_readahead(), of course it has been batched.

Or we should leave this problem alone,before we can write back zswap
in batches.

Thanks again.