RE: [RFC] mm: add support for zsmalloc and zcache
From: Dan Magenheimer
Date: Thu Sep 06 2012 - 16:38:29 EST
In response to this RFC for zcache promotion, I've been asked to summarize
the concerns and objections which led me to NACK the previous zcache
promotion request. While I see great potential in zcache, I think some
significant design challenges exist, many of which are already resolved in
the new codebase ("zcache2"). These design issues include:
A) Andrea Arcangeli pointed out and, after some deep thinking, I came
to agree that zcache _must_ have some "backdoor exit" for frontswap
pages [2], else bad things will eventually happen in many workloads.
This requires some kind of reaper of frontswap'ed zpages[1] which "evicts"
the data to the actual swap disk. This reaper must ensure it can reclaim
_full_ pageframes (not just zpages) or it has little value. Further the
reaper should determine which pageframes to reap based on an LRU-ish
(not random) approach.
B) Zsmalloc has potentially far superior density vs zbud because zsmalloc can
pack more zpages into each pageframe and allows for zpages that cross pageframe
boundaries. But, (i) this is very data dependent... the average compression
for LZO is about 2x. The frontswap'ed pages in the kernel compile benchmark
compress to about 4x, which is impressive but probably not representative of
a wide range of zpages and workloads. And (ii) there are many historical
discussions going back to Knuth and mainframes about tight packing of data...
high density has some advantages but also brings many disadvantages related to
fragmentation and compaction. Zbud is much less aggressive (max two zpages
per pageframe) but has a similar density on average data, without the
disadvantages of high density.
So zsmalloc may blow zbud away on a kernel compile benchmark but, if both were
runners, zsmalloc is a sprinter and zbud is a marathoner. Perhaps the best
solution is to offer both?
Further, back to (A), reaping is much easier with zbud because (i) zsmalloc
is currently unable to deal with pointers to zpages from tmem data structures
which may be dereferenced concurrently, (ii) because there may be many more such
pointers, and (iii) because zpages stored by zsmalloc may cross pageframe boundaries.
The locking issues that arise with zsmalloc for reaping even a single pageframe
are complex; though they might eventually be solved with zsmalloc, this is
likely a very big project.
C) Zcache uses zbud(v1) for cleancache pages and includes a shrinker which
reclaims pairs of zpages to release whole pageframes, but there is
no attempt to shrink/reclaim cleanache pageframes in LRU order.
It would also be nice if single-cleancache-pageframe reclaim could
be implemented.
D) Ramster is built on top of zcache, but required a handful of changes
(on the order of 100 lines). Due to various circumstances, ramster was
submitted as a fork of zcache with the intent to unfork as soon as
possible. The proposal to promote the older zcache perpetuates that fork,
requiring fixes in multiple places, whereas the new codebase supports
ramster and provides clearly defined boundaries between the two.
The new codebase (zcache) just submitted as part of drivers/staging/ramster
resolves these problems (though (A) is admittedly still a work in progress).
Before other key mm maintainers read and comment on zcache, I think
it would be most wise to move to a codebase which resolves the known design
problems or, at least to thoroughly discuss and debunk the design issues
described above. OR... it may be possible to identify and pursue some
compromise plan. In any case, I believe the promotion proposal is premature.
Unfortunately, I will again be away from email for a few days, but
will be happy to respond after I return if clarification or more detailed
discussion is needed.
Dan
Footnotes:
[1] zpage is shorthand for a compressed PAGE_SIZE-sized page.
[2] frontswap, since it uses the tmem architecture, has always had a "frontdoor
bouncer"... any frontswap page can be rejected by zcache for any reason,
such as if there is no non-emergency pageframes available or if any individual
page (or long sequence of pages) compresses poorly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/