Re: [PATCH v4 01/22] mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool

From: Alex Shi
Date: Mon Aug 05 2024 - 00:06:43 EST




On 8/3/24 2:52 AM, Vishal Moola wrote:
> On Mon, Jul 29, 2024 at 07:25:13PM +0800, alexs@xxxxxxxxxx wrote:
>> From: Alex Shi <alexs@xxxxxxxxxx>
>
> I've been busy with other things, so I haven't been able to review this
> until now. Thanks to both you and Hyeonggon for working on this memdesc :)

Hi Vishal,

Thank a lot for your comments!

My pleasure! :)

>
>> The 1st patch introduces new memory decriptor zpdesc and rename
>> zspage.first_page to zspage.first_zpdesc, no functional change.
>>
>> We removed PG_owner_priv_1 since it was moved to zspage after
>> commit a41ec880aa7b ("zsmalloc: move huge compressed obj from
>> page to zspage").
>>
>> And keep the memcg_data member, since as Yosry pointed out:
>> "When the pages are freed, put_page() -> folio_put() -> __folio_put() will call
>> mem_cgroup_uncharge(). The latter will call folio_memcg() (which reads
>> folio->memcg_data) to figure out if uncharging needs to be done.
>>
>> There are also other similar code paths that will check
>> folio->memcg_data. It is currently expected to be present for all
>> folios. So until we have custom code paths per-folio type for
>> allocation/freeing/etc, we need to keep folio->memcg_data present and
>> properly initialized."
>>
>> Originally-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
>> Signed-off-by: Alex Shi <alexs@xxxxxxxxxx>
>> ---
>> mm/zpdesc.h | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++
>> mm/zsmalloc.c | 21 ++++++++--------
>> 2 files changed, 76 insertions(+), 11 deletions(-)
>> create mode 100644 mm/zpdesc.h
>>
>> diff --git a/mm/zpdesc.h b/mm/zpdesc.h
>> new file mode 100644
>> index 000000000000..2dbef231f616
>> --- /dev/null
>> +++ b/mm/zpdesc.h
>> @@ -0,0 +1,66 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* zpdesc.h: zswap.zpool memory descriptor
>> + *
>> + * Written by Alex Shi <alexs@xxxxxxxxxx>
>> + * Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
>> + */
>> +#ifndef __MM_ZPDESC_H__
>> +#define __MM_ZPDESC_H__
>> +
>> +/*
>> + * struct zpdesc - Memory descriptor for zpool memory, now is for zsmalloc
>> + * @flags: Page flags, PG_private: identifies the first component page
>> + * @lru: Indirectly used by page migration
>> + * @mops: Used by page migration
>> + * @next: Next zpdesc in a zspage in zsmalloc zpool
>> + * @handle: For huge zspage in zsmalloc zpool
>> + * @zspage: Pointer to zspage in zsmalloc
>> + * @memcg_data: Memory Control Group data.
>> + *
>
> I think its a good idea to include comments for the padding (namely what
> aliases with it in struct page) here as well. It doesn't hurt, and will
> make them easier to remove in the future.
>
>> + * This struct overlays struct page for now. Do not modify without a good
>> + * understanding of the issues.
>> + */
>> +struct zpdesc {
>> + unsigned long flags;
>> + struct list_head lru;
>> + struct movable_operations *mops;
>> + union {
>> + /* Next zpdescs in a zspage in zsmalloc zpool */
>> + struct zpdesc *next;
>> + /* For huge zspage in zsmalloc zpool */
>> + unsigned long handle;
>> + };
>> + struct zspage *zspage;
>
> I like using pointers here, although I think the comments should be more
> precise about what the purpose of the pointer is. Maybe something like
> "Points to the zspage this zpdesc is a part of" or something.

I will change the comments for this member. Thanks!

>
>> + unsigned long _zp_pad_1;
>> +#ifdef CONFIG_MEMCG
>> + unsigned long memcg_data;
>> +#endif
>> +};
>
> You should definitely fold your additions to the struct from patch 17
> into this patch. It makes it easier to review, and better for anyone
> looking at the commit log in the future.

Thanks! I will move the struct part from patch 17 here.

>
>> +#define ZPDESC_MATCH(pg, zp) \
>> + static_assert(offsetof(struct page, pg) == offsetof(struct zpdesc, zp))
>> +
>> +ZPDESC_MATCH(flags, flags);
>> +ZPDESC_MATCH(lru, lru);
>> +ZPDESC_MATCH(mapping, mops);
>> +ZPDESC_MATCH(index, next);
>> +ZPDESC_MATCH(index, handle);
>> +ZPDESC_MATCH(private, zspage);
>> +#ifdef CONFIG_MEMCG
>> +ZPDESC_MATCH(memcg_data, memcg_data);
>> +#endif
>> +#undef ZPDESC_MATCH
>> +static_assert(sizeof(struct zpdesc) <= sizeof(struct page));
>> +
>> +#define zpdesc_page(zp) (_Generic((zp), \
>> + const struct zpdesc *: (const struct page *)(zp), \
>> + struct zpdesc *: (struct page *)(zp)))
>> +
>> +#define zpdesc_folio(zp) (_Generic((zp), \
>> + const struct zpdesc *: (const struct folio *)(zp), \
>> + struct zpdesc *: (struct folio *)(zp)))
>> +
>> +#define page_zpdesc(p) (_Generic((p), \
>> + const struct page *: (const struct zpdesc *)(p), \
>> + struct page *: (struct zpdesc *)(p)))
>> +
>> +#endif
>
> I'm don't think we need both page and folio cast functions for zpdescs.
> Sticking to pages will probably suffice (and be easiest) since all APIs
> zsmalloc cares about are already defined.
>
> We can stick to 1 "middle-man" descriptor for zpdescs since zsmalloc
> uses those pages as space to track zspages and nothing more. We'll likely
> end up completely removing it from zsmalloc once we can allocate
> memdescs on their own: It seems most (if not all) of the "indirect" members
> of zpdesc are used as indicators to the rest of core-mm telling them not to
> mess with that memory.

Yes, that is also my first attempt to skip folio part, but I found we could got
6.3% object size reduced on zsmalloc.o file, from 37.2KB to 34.9KB, if we use
folio series lock and folio_get/put functions. That saving come from compound_head
check skipping.
So I wrapped them carefully in zpdesc series functions in zpdesc.h file.
They should be easy replaced when we use memdescs in the future. Could we keep them
a while, or ?

Thanks
Alex