Re: [PATCH] zsmalloc: do not use bit_spin_lock

From: tiantao (H)
Date: Wed Dec 23 2020 - 07:45:00 EST



在 2020/12/23 8:11, Vitaly Wool 写道:
On Tue, 22 Dec 2020, 22:06 Song Bao Hua (Barry Song),
<song.bao.hua@xxxxxxxxxxxxx> wrote:


-----Original Message-----
From: Vitaly Wool [mailto:vitaly.wool@xxxxxxxxxxxx]
Sent: Tuesday, December 22, 2020 10:44 PM
To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
Cc: Shakeel Butt <shakeelb@xxxxxxxxxx>; Minchan Kim <minchan@xxxxxxxxxx>; Mike
Galbraith <efault@xxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>; linux-mm
<linux-mm@xxxxxxxxx>; Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>;
NitinGupta <ngupta@xxxxxxxxxx>; Sergey Senozhatsky
<sergey.senozhatsky.work@xxxxxxxxx>; Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx>; tiantao (H) <tiantao6@xxxxxxxxxxxxx>
Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock

On Tue, 22 Dec 2020, 03:11 Song Bao Hua (Barry Song),
<song.bao.hua@xxxxxxxxxxxxx> wrote:


-----Original Message-----
From: Song Bao Hua (Barry Song)
Sent: Tuesday, December 22, 2020 3:03 PM
To: 'Vitaly Wool' <vitaly.wool@xxxxxxxxxxxx>
Cc: Shakeel Butt <shakeelb@xxxxxxxxxx>; Minchan Kim <minchan@xxxxxxxxxx>;
Mike
Galbraith <efault@xxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>; linux-mm
<linux-mm@xxxxxxxxx>; Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>;
NitinGupta <ngupta@xxxxxxxxxx>; Sergey Senozhatsky
<sergey.senozhatsky.work@xxxxxxxxx>; Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx>; tiantao (H) <tiantao6@xxxxxxxxxxxxx>
Subject: RE: [PATCH] zsmalloc: do not use bit_spin_lock


I'm still not convinced. Will kmap what, src? At this point src might
become
just a bogus pointer.

As long as the memory is still there, we can kmap it by its page struct.
But
if
it is not there anymore, we have no way.

Why couldn't the object have been moved somewhere else (due to the compaction
mechanism for instance)
at the time DMA kicks in?
So zs_map_object() will guarantee the src won't be moved by holding those
preemption-disabled lock?
If so, it seems we have to drop the MOVABLE gfp in zswap for zsmalloc case?

Or we can do get_page() to avoid the movement of the page.

I would like to discuss this more in zswap context than zsmalloc's.
Since zsmalloc does not implement reclaim callback, using it in zswap
is a corner case anyway.
I see. But it seems we still need a solution for the compatibility
of zsmalloc and zswap? this will require change in either zsmalloc
or zswap.
or do you want to make zswap depend on !ZSMALLOC?
No, I really don't think we should go that far. What if we add a flag
to zpool, named like "can_sleep_mapped", and have it set for
zbud/z3fold?
Then zswap could go the current path if the flag is set; and if it's
not set, and mutex_trylock fails, copy data from src to a temporary
buffer, then unmap the handle, take the mutex, process the buffer
instead of src. Not the nicest thing to do but at least it won't break
anything.

write the following patch according to your idea, what do you think ?

--- a/mm/zswap.c

+++ b/mm/zswap.c
@@ -1235,7 +1235,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
        struct zswap_entry *entry;
        struct scatterlist input, output;
        struct crypto_acomp_ctx *acomp_ctx;
-       u8 *src, *dst;
+       u8 *src, *dst, *tmp;
        unsigned int dlen;
        int ret;

@@ -1262,16 +1262,26 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
        if (zpool_evictable(entry->pool->zpool))
                src += sizeof(struct zswap_header);

+       if (!zpool_can_sleep_mapped(entry->pool->zpool) && !mutex_trylock(acomp_ctx->mutex)) {
+               tmp = kmemdup(src, entry->length, GFP_ATOMIC);
+               zpool_unmap_handle(entry->pool->zpool, entry->handle); ???
+               if (!tmp)
+                       goto freeentry;
+       }
        acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
        mutex_lock(acomp_ctx->mutex);
-       sg_init_one(&input, src, entry->length);
+       sg_init_one(&input, zpool_can_sleep_mapped(entry->pool->zpool) ? src : tmp, entry->length);
        sg_init_table(&output, 1);
        sg_set_page(&output, page, PAGE_SIZE, 0);
        acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
        ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
        mutex_unlock(acomp_ctx->mutex);

-       zpool_unmap_handle(entry->pool->zpool, entry->handle);
+       if (zpool_can_sleep_mapped(entry->pool->zpool))
+               zpool_unmap_handle(entry->pool->zpool, entry->handle);
+       else
+               kfree(tmp);
+
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -440,6 +440,7 @@ static u64 zs_zpool_total_size(void *pool)

 static struct zpool_driver zs_zpool_driver = {
        .type =                   "zsmalloc",
+       .sleep_mapped =           false,
        .owner =                  THIS_MODULE,
        .create =                 zs_zpool_create,
        .destroy =                zs_zpool_destroy,


~Vitaly

zswap, on the other hand, may be dealing with some new backends in
future which have more chances to become mainstream. Imagine typical
NUMA-like cases, i. e. a zswap pool allocated in some kind SRAM, or in
unused video memory. In such a case if you try to use a pointer to an
invalidated zpool mapping, you are on the way to thrash the system.
So: no assumptions that the zswap pool is in regular linear RAM should
be made.

~Vitaly
Thanks
Barry