Re: [PATCH 7/8] zswap: add to mm/

From: Dave Hansen
Date: Wed Jan 02 2013 - 10:56:05 EST

Next message: Christoph Lameter: "Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corruptingmemory"
Previous message: Dave Jones: "memory corruption, possibly caused by i915"
In reply to: Seth Jennings: "Re: [PATCH 7/8] zswap: add to mm/"
Next in thread: Dan Magenheimer: "RE: [PATCH 7/8] zswap: add to mm/"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 01/01/2013 09:52 AM, Seth Jennings wrote:
> On 12/31/2012 05:06 PM, Dan Magenheimer wrote:
>> A second related issue that concerns me is that, although you
>> are now, like zcache2, using an LRU queue for compressed pages
>> (aka "zpages"), there is no relationship between that queue and
>> physical pageframes. In other words, you may free up 100 zpages
>> out of zswap via zswap_flush_entries, but not free up a single
>> pageframe. This seems like a significant design issue. Or am
>> I misunderstanding the code?
>
> You understand correctly. There is room for optimization here and it
> is something I'm working on right now.

It's the same "design issue" that the slab shrinkers have, and they are
likely to have some substantially consistently smaller object sizes.

>> A third concern is about scalability... the locking seems very
>> coarse-grained. In zcache, you personally observed and fixed
>> hashbucket contention (see https://lkml.org/lkml/2011/9/29/215).
>> Doesn't zswap's tree_lock essentially use a single tree (per
>> swaptype), i.e. no scalability?
>
> The reason the coarse lock isn't a problem for zswap like the hash
> bucket locks where in zcache is that the lock is not held for long
> periods time as it is in zcache. It is only held while operating on
> the tree, not during compression/decompression and larger memory
> operations.

Lock hold times don't often dominate lock cost these days. The limiting
factor tends to be the cost of atomic operations to bring the cacheline
over to the CPUs acquiring the lock.

> Also, I've done some lockstat checks and the zswap tree lock is way
> down on the list contributing <1% of the lock contention wait time on
> a 4-core system. The anon_vma lock is the primary bottleneck.

4 cores these days is awfully small. Some of our fellow colleagues at
IBM might be a _bit_ concerned if we told them that we were using a
4-core non-NUMA system and extrapolating lock contention from there. :)

It's curious that you chose the anon_vma lock, though. It can only
possibly show _contention_ when you've got a bunch of CPUs beating on
the related VMAs. That contention disappears in workloads that aren't
threaded, so it seems at least a bit imprecise to say anon_vma lock is
the primary bottleneck.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Christoph Lameter: "Re: [PATCH 1/2] tmpfs mempolicy: fix /proc/mounts corruptingmemory"
Previous message: Dave Jones: "memory corruption, possibly caused by i915"
In reply to: Seth Jennings: "Re: [PATCH 7/8] zswap: add to mm/"
Next in thread: Dan Magenheimer: "RE: [PATCH 7/8] zswap: add to mm/"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]