Re: [PATCH v4 01/22] mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool
From: Vishal Moola
Date: Fri Aug 02 2024 - 14:52:20 EST
On Mon, Jul 29, 2024 at 07:25:13PM +0800, alexs@xxxxxxxxxx wrote:
> From: Alex Shi <alexs@xxxxxxxxxx>
I've been busy with other things, so I haven't been able to review this
until now. Thanks to both you and Hyeonggon for working on this memdesc :)
> The 1st patch introduces new memory decriptor zpdesc and rename
> zspage.first_page to zspage.first_zpdesc, no functional change.
>
> We removed PG_owner_priv_1 since it was moved to zspage after
> commit a41ec880aa7b ("zsmalloc: move huge compressed obj from
> page to zspage").
>
> And keep the memcg_data member, since as Yosry pointed out:
> "When the pages are freed, put_page() -> folio_put() -> __folio_put() will call
> mem_cgroup_uncharge(). The latter will call folio_memcg() (which reads
> folio->memcg_data) to figure out if uncharging needs to be done.
>
> There are also other similar code paths that will check
> folio->memcg_data. It is currently expected to be present for all
> folios. So until we have custom code paths per-folio type for
> allocation/freeing/etc, we need to keep folio->memcg_data present and
> properly initialized."
>
> Originally-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> Signed-off-by: Alex Shi <alexs@xxxxxxxxxx>
> ---
> mm/zpdesc.h | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++
> mm/zsmalloc.c | 21 ++++++++--------
> 2 files changed, 76 insertions(+), 11 deletions(-)
> create mode 100644 mm/zpdesc.h
>
> diff --git a/mm/zpdesc.h b/mm/zpdesc.h
> new file mode 100644
> index 000000000000..2dbef231f616
> --- /dev/null
> +++ b/mm/zpdesc.h
> @@ -0,0 +1,66 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* zpdesc.h: zswap.zpool memory descriptor
> + *
> + * Written by Alex Shi <alexs@xxxxxxxxxx>
> + * Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> + */
> +#ifndef __MM_ZPDESC_H__
> +#define __MM_ZPDESC_H__
> +
> +/*
> + * struct zpdesc - Memory descriptor for zpool memory, now is for zsmalloc
> + * @flags: Page flags, PG_private: identifies the first component page
> + * @lru: Indirectly used by page migration
> + * @mops: Used by page migration
> + * @next: Next zpdesc in a zspage in zsmalloc zpool
> + * @handle: For huge zspage in zsmalloc zpool
> + * @zspage: Pointer to zspage in zsmalloc
> + * @memcg_data: Memory Control Group data.
> + *
I think its a good idea to include comments for the padding (namely what
aliases with it in struct page) here as well. It doesn't hurt, and will
make them easier to remove in the future.
> + * This struct overlays struct page for now. Do not modify without a good
> + * understanding of the issues.
> + */
> +struct zpdesc {
> + unsigned long flags;
> + struct list_head lru;
> + struct movable_operations *mops;
> + union {
> + /* Next zpdescs in a zspage in zsmalloc zpool */
> + struct zpdesc *next;
> + /* For huge zspage in zsmalloc zpool */
> + unsigned long handle;
> + };
> + struct zspage *zspage;
I like using pointers here, although I think the comments should be more
precise about what the purpose of the pointer is. Maybe something like
"Points to the zspage this zpdesc is a part of" or something.
> + unsigned long _zp_pad_1;
> +#ifdef CONFIG_MEMCG
> + unsigned long memcg_data;
> +#endif
> +};
You should definitely fold your additions to the struct from patch 17
into this patch. It makes it easier to review, and better for anyone
looking at the commit log in the future.
> +#define ZPDESC_MATCH(pg, zp) \
> + static_assert(offsetof(struct page, pg) == offsetof(struct zpdesc, zp))
> +
> +ZPDESC_MATCH(flags, flags);
> +ZPDESC_MATCH(lru, lru);
> +ZPDESC_MATCH(mapping, mops);
> +ZPDESC_MATCH(index, next);
> +ZPDESC_MATCH(index, handle);
> +ZPDESC_MATCH(private, zspage);
> +#ifdef CONFIG_MEMCG
> +ZPDESC_MATCH(memcg_data, memcg_data);
> +#endif
> +#undef ZPDESC_MATCH
> +static_assert(sizeof(struct zpdesc) <= sizeof(struct page));
> +
> +#define zpdesc_page(zp) (_Generic((zp), \
> + const struct zpdesc *: (const struct page *)(zp), \
> + struct zpdesc *: (struct page *)(zp)))
> +
> +#define zpdesc_folio(zp) (_Generic((zp), \
> + const struct zpdesc *: (const struct folio *)(zp), \
> + struct zpdesc *: (struct folio *)(zp)))
> +
> +#define page_zpdesc(p) (_Generic((p), \
> + const struct page *: (const struct zpdesc *)(p), \
> + struct page *: (struct zpdesc *)(p)))
> +
> +#endif
I'm don't think we need both page and folio cast functions for zpdescs.
Sticking to pages will probably suffice (and be easiest) since all APIs
zsmalloc cares about are already defined.
We can stick to 1 "middle-man" descriptor for zpdescs since zsmalloc
uses those pages as space to track zspages and nothing more. We'll likely
end up completely removing it from zsmalloc once we can allocate
memdescs on their own: It seems most (if not all) of the "indirect" members
of zpdesc are used as indicators to the rest of core-mm telling them not to
mess with that memory.