Re: [RFC PATCH v2 00/21] mm/zsmalloc: Split zsdesc from struct page

From: Yosry Ahmed
Date: Thu Jul 20 2023 - 14:31:53 EST


On Thu, Jul 20, 2023 at 4:34 AM Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> wrote:
>
> On Thu, Jul 20, 2023 at 4:55 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> >
> > On Thu, Jul 20, 2023 at 12:18 AM Sergey Senozhatsky
> > <senozhatsky@xxxxxxxxxxxx> wrote:
> > >
> > > On (23/07/13 13:20), Hyeonggon Yoo wrote:
> > > > The purpose of this series is to define own memory descriptor for zsmalloc,
> > > > instead of re-using various fields of struct page. This is a part of the
> > > > effort to reduce the size of struct page to unsigned long and enable
> > > > dynamic allocation of memory descriptors.
> > > >
> > > > While [1] outlines this ultimate objective, the current use of struct page
> > > > is highly dependent on its definition, making it challenging to separately
> > > > allocate memory descriptors.
> > >
> > > I glanced through the series and it all looks pretty straight forward to
> > > me. I'll have a closer look. And we definitely need Minchan to ACK it.
> > >
> > > > Therefore, this series introduces new descriptor for zsmalloc, called
> > > > zsdesc. It overlays struct page for now, but will eventually be allocated
> > > > independently in the future.
> > >
> > > So I don't expect zsmalloc memory usage increase. On one hand for each
> > > physical page that zspage consists of we will allocate zsdesc (extra bytes),
> > > but at the same time struct page gets slimmer. So we should be even, or
> > > am I wrong?
> >
> > Well, it depends. Here is my understanding (which may be completely wrong):
> >
> > The end goal would be to have an 8-byte memdesc for each order-0 page,
> > and then allocate a specialized struct per-folio according to the use
> > case. In this case, we would have a memdesc and a zsdesc for each
> > order-0 page. If sizeof(zsdesc) is 64 bytes (on 64-bit), then it's a
> > net loss. The savings only start kicking in with higher order folios.
> > As of now, zsmalloc only uses order-0 pages as far as I can tell, so
> > the usage would increase if I understand correctly.
>
> I partially agree with you that the point of memdesc stuff is
> allocating a use-case specific
> descriptor per folio. but I thought the primary gain from memdesc was
> from anon and file pages
> (where high order pages are more usable), rather than zsmalloc.
>
> And I believe enabling a memory descriptor per folio would be
> impossible (or inefficient)
> if zsmalloc and other subsystems are using struct page in the current
> way (or please tell me I'm wrong?)
>
> So I expect the primary gain would be from high-order anon/file folios,
> while this series is a prerequisite for them to work sanely.

Right, I agree with that, sorry if I wasn't clear. I meant that
generally speaking, we see gains from memdesc from higher order
folios, so for zsmalloc specifically we probably won't see seeing any
savings, and *might* see some extra usage (which I might be wrong
about, see below).

>
> > It seems to me though the sizeof(zsdesc) is actually 56 bytes (on
> > 64-bit), so sizeof(zsdesc) + sizeof(memdesc) would be equal to the
> > current size of struct page. If that's true, then there is no loss,
>
> Yeah, zsdesc would be 56 bytes on 64 bit CPUs as memcg_data field is
> not used in zsmalloc.
> More fields in the current struct page might not be needed in the
> future, although it's hard to say at the moment.
> but it's not a loss.

Is page->memcg_data something that we can drop? Aren't there code
paths that will check page->memcg_data even for kernel pages (e.g.
__folio_put() -> __folio_put_small() -> mem_cgroup_uncharge() ) ?

>
> > and there's potential gain if we start using higher order folios in
> > zsmalloc in the future.
>
> AFAICS zsmalloc should work even when the system memory is fragmented,
> so we may implement fallback allocation (as currently discussed in
> large anon folios thread).

Of course, any usage of higher order folios in zsmalloc must have a
fallback logic, although it might be simpler for zsmalloc than anon
folios. I agree that's off topic here.

>
> It might work, but IMHO the purpose of this series is to enable memdesc
> for large anon/file folios, rather than seeing a large gain in zsmalloc itself.
> (But even in zsmalloc, it's not a loss)
>
> > (That is of course unless we want to maintain cache line alignment for
> > the zsdescs, then we might end up using 64 bytes anyway).
>
> we already don't require cache line alignment for struct page. the current
> alignment requirement is due to SLUB's cmpxchg128 operation, not cache
> line alignment.

I thought we want struct page to be cache line aligned (to avoid
having to fetch two cache lines for one struct page), but I can easily
be wrong.

>
> I might be wrong in some aspects, so please tell me if I am.
> And thank you and Sergey for taking a look at this!

Thanks to you for doing the work!

> --
> Hyeonggon