Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling
From: Johannes Weiner
Date: Fri Dec 08 2023 - 11:35:03 EST
On Thu, Dec 07, 2023 at 05:12:13PM -0800, Yosry Ahmed wrote:
> On Thu, Dec 7, 2023 at 5:03 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
> > On Thu, Dec 7, 2023 at 4:19 PM Chris Li <chrisl@xxxxxxxxxx> wrote:
> > > I am wondering about the status of "memory.swap.tiers" proof of concept patch?
> > > Are we still on board to have this two patch merge together somehow so
> > > we can have
> > > "memory.swap.tiers" == "all" and "memory.swap.tiers" == "zswap" cover the
> > > memory.zswap.writeback == 1 and memory.zswap.writeback == 0 case?
> > >
> > > Thanks
> > >
> > > Chris
> > >
> >
> > Hi Chris,
> >
> > I briefly summarized my recent discussion with Johannes here:
> >
> > https://lore.kernel.org/all/CAKEwX=NwGGRAtXoNPfq63YnNLBCF0ZDOdLVRsvzUmYhK4jxzHA@xxxxxxxxxxxxxx/
> >
> > TL;DR is we acknowledge the potential usefulness of swap.tiers
> > interface, but the use case is not quite there yet, so it does not
> > make too much sense to build up that heavy machinery now.
> > zswap.writeback is a more urgent need, and does not prevent swap.tiers
> > if we do decide to implement it.
>
> I am honestly not convinced by this. There is no heavy machinery here.
> The interface is more generic and extensible, but the implementation
> is roughly the same. Unless we have a reason to think a swap.tiers
> interface may make it difficult to extend this later or will not
> support some use cases, I think we should go ahead with it. If we are
> worried that "tiers" may not accurately describe future use cases, we
> can be more generic and call it swap.types or something.
I have to disagree. The generic swap types or tiers ideas actually
look pretty far-fetched to me, and there is a lack of convincing
explanation for why this is even a probable direction for swap.
For example,
1. What are the other backends? Where you seem to see a multitude of
backends and arbitrary hierarchies of them, I see compression and
flash, and really not much else. And there is only one reasonable
direction in which to combine those two.
The IOPs and latencies of HDDs and network compared to modern
memory sizes and compute speeds make them for the most part
impractical as paging backends.
So I don't see a common third swap backend, let alone a fourth or a
fifth, or a multitude of meaningful ways of combining them...
2. Even if the usecases were there, enabling this would be a ton of
work and open interface questions:
1) There is no generic code to transfer pages between arbitrary
backends.
2) There is no accepted indirection model where a swap pte can refer
to backends dynamically, in a way that makes migration feasible
at scale.
3) Arbitrary global strings are somewhat unlikely to be accepted as
a way to configure these hierarchies.
4) Backend file paths in a global sysfs file don't work well with
namespacing. The swapfile could be in a container
namespace. Containers are not guaranteed to see /sys.
5) Fixed keywords like "zswap" might not be good enough - what about
compression and backend parameters?
None of these are insurmountable. My point is that this would be a
huge amount of prerequisite code and effort for what seems would be a
fringe usecase at best right now.
And there could be a lot of curve balls in both the software design as
well as the hardware development between now and then that could make
your proposals moot. Is a per-cgroup string file really going to be
the right way to configure arbitrary hierarchies if they materialize?
This strikes me as premature and speculative, for what could be, some
day.
We don't even do it for *internal API*. There is a review rule to
introduce a function in the same patch as its first caller, to make
sure it's the right abstraction and a good fit for the usecase. There
is no way we can have a lower bar than that for permanent ABI.
The patch here integrates with what zswap is NOW and always has been:
a compressing writeback cache for swap.
Should multiple swap tiers overcome all the above and actually become
real, this knob here would be the least of our worries. It would be
easy to just ignore, automatically override, or deprecate.
So I don't think you made a reasonable proposal for an alternative, or
gave convincing reasons to hold off this one.