Re: [PATCH v3 2/3] mm/zswap: bugfix: memory leak when invalidate andreclaim occur concurrently

From: Minchan Kim
Date: Mon Sep 23 2013 - 21:02:53 EST


On Mon, Sep 23, 2013 at 04:21:49PM +0800, Weijie Yang wrote:
> Consider the following scenario:
> thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page)
> thread 1: call zswap_frontswap_invalidate_page to invalidate entry x.
> finished, entry x and its zbud is not freed as its refcount != 0
> now, the swap_map[x] = 0
> thread 0: now call zswap_get_swap_cache_page
> swapcache_prepare return -ENOENT because entry x is not used any more
> zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM
> zswap_writeback_entry do nothing except put refcount
> Now, the memory of zswap_entry x and its zpage leak.
>
> Modify:
> - check the refcount in fail path, free memory if it is not referenced.

Hmm, I don't like this because zswap refcount routine is already mess for me.
I'm not sure why it was designed from the beginning. I hope we should fix it first.

1. zswap_rb_serach could include zswap_entry_get semantic if it founds a entry from
the tree. Of course, we should ranme it as find_get_zswap_entry like find_get_page.
2. zswap_entry_put could hide resource free function like zswap_free_entry so that
all of caller can use it easily following pattern.

find_get_zswap_entry
...
...
zswap_entry_put

Of course, zswap_entry_put have to check the entry is in the tree or not
so if someone already removes it from the tree, it should avoid double remove.

One of the concern I can think is that approach extends critical section
but I think it would be no problem because more bottleneck would be [de]compress
functions. If it were really problem, we can mitigate a problem with moving
unnecessary functions out of zswap_free_entry because it seem to be rather
over-enginnering.

> - use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path
> can be not only caused by nomem but also by invalidate.
>
> Signed-off-by: Weijie Yang <weijie.yang@xxxxxxxxxxx>
> Reviewed-by: Bob Liu <bob.liu@xxxxxxxxxx>
> Cc: Minchan Kim <minchan@xxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Acked-by: Seth Jennings <sjenning@xxxxxxxxxxxxxxxxxx>
> ---
> mm/zswap.c | 21 +++++++++++++--------
> 1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index cbd9578..1be7b90 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -387,7 +387,7 @@ static void zswap_free_entry(struct zswap_tree *tree, struct zswap_entry *entry)
> enum zswap_get_swap_ret {
> ZSWAP_SWAPCACHE_NEW,
> ZSWAP_SWAPCACHE_EXIST,
> - ZSWAP_SWAPCACHE_NOMEM
> + ZSWAP_SWAPCACHE_FAIL,
> };
>
> /*
> @@ -401,9 +401,9 @@ enum zswap_get_swap_ret {
> * added to the swap cache, and returned in retpage.
> *
> * If success, the swap cache page is returned in retpage
> - * Returns 0 if page was already in the swap cache, page is not locked
> - * Returns 1 if the new page needs to be populated, page is locked
> - * Returns <0 on error
> + * Returns ZSWAP_SWAPCACHE_EXIST if page was already in the swap cache
> + * Returns ZSWAP_SWAPCACHE_NEW if the new page needs to be populated, page is locked
> + * Returns ZSWAP_SWAPCACHE_FAIL on error
> */
> static int zswap_get_swap_cache_page(swp_entry_t entry,
> struct page **retpage)
> @@ -475,7 +475,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
> if (new_page)
> page_cache_release(new_page);
> if (!found_page)
> - return ZSWAP_SWAPCACHE_NOMEM;
> + return ZSWAP_SWAPCACHE_FAIL;
> *retpage = found_page;
> return ZSWAP_SWAPCACHE_EXIST;
> }
> @@ -529,11 +529,11 @@ static int zswap_writeback_entry(struct zbud_pool *pool, unsigned long handle)
>
> /* try to allocate swap cache page */
> switch (zswap_get_swap_cache_page(swpentry, &page)) {
> - case ZSWAP_SWAPCACHE_NOMEM: /* no memory */
> + case ZSWAP_SWAPCACHE_FAIL: /* no memory or invalidate happened */
> ret = -ENOMEM;
> goto fail;
>
> - case ZSWAP_SWAPCACHE_EXIST: /* page is unlocked */
> + case ZSWAP_SWAPCACHE_EXIST:
> /* page is already in the swap cache, ignore for now */
> page_cache_release(page);
> ret = -EEXIST;
> @@ -591,7 +591,12 @@ static int zswap_writeback_entry(struct zbud_pool *pool, unsigned long handle)
>
> fail:
> spin_lock(&tree->lock);
> - zswap_entry_put(entry);
> + refcount = zswap_entry_put(entry);
> + if (refcount <= 0) {
> + /* invalidate happened, consider writeback as success */
> + zswap_free_entry(tree, entry);
> + ret = 0;
> + }
> spin_unlock(&tree->lock);
> return ret;
> }
> --
> 1.7.10.4
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/