RE: [PATCH 7/8] zswap: add to mm/

From: Dan Magenheimer
Date: Fri Jan 04 2013 - 17:47:42 EST

> From: Seth Jennings [mailto:sjenning@xxxxxxxxxxxxxxxxxx]
> Subject: Re: [PATCH 7/8] zswap: add to mm/
> On 01/03/2013 04:33 PM, Dan Magenheimer wrote:
> >> From: Seth Jennings [mailto:sjenning@xxxxxxxxxxxxxxxxxx]
> >>
> >> However, once the flushing code was introduced and could free an entry
> >> from the zswap_fs_store() path, it became necessary to add a per-entry
> >> refcount to make sure that the entry isn't freed while another code
> >> path was operating on it.
> >
> > Hmmm... doesn't the refcount at least need to be an atomic_t?
> An entry's refcount is only ever changed under the tree lock, so
> making them atomic_t would be redundantly atomic.

Maybe I'm missing something still but then I think you also
need to evaluate and act on the refcount (not just read it) while
your treelock is held. I.e., in:

> + /* page is already in the swap cache, ignore for now */
> + spin_lock(&tree->lock);
> + refcount = zswap_entry_put(entry);
> + spin_unlock(&tree->lock);
> +
> + if (likely(refcount))
> + return 0;
> +
> + /* if the refcount is zero, invalidate must have come in */
> + /* free */
> + zs_free(tree->pool, entry->handle);
> + zswap_entry_cache_free(entry);
> + atomic_dec(&zswap_stored_pages);

the entry's refcount may be changed by another processor
immediately after the unlock, and then the "if (refcount)"
is testing a stale value and you will get (I think) a memory leak.

There is similar racy code in zswap_fs_invalidate_page which
I think could lead to a double free. There's another
I think in zswap_fs_load... And the refcount is dec'd
in one path inside of zswap_fs_store as well which may
race with the above.

When flushing multiple zpages to free a pageframe, you may
need to test refcounts for all the entries while within the lock.
If so, this is one place where the high-density storage will make
things messy, especially if page boundaries are crossed.

A nit: Even I, steeped in tmem terminology, was confused by
your use of "fs"... to nearly all readers it will
be translated as "filesystem" which is mystifying.
Just spell it out "frontswap", even if it causes a few
lines to be wrapped.

Have a good weekend!
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at