Re: [PATCH v2] mm/zswap: change zswap to writethrough cache

From: Dan Streetman
Date: Thu Dec 12 2013 - 21:59:17 EST


On Wed, Dec 11, 2013 at 4:02 AM, Bob Liu <lliubbo@xxxxxxxxx> wrote:
> Hi Dan & Seth,
>
> On Wed, Nov 27, 2013 at 9:28 AM, Dan Streetman <ddstreet@xxxxxxxx> wrote:
>> On Mon, Nov 25, 2013 at 1:00 PM, Seth Jennings <sjennings@xxxxxxxxxxxxxx> wrote:
>>> On Fri, Nov 22, 2013 at 11:29:16AM -0600, Seth Jennings wrote:
>>>> On Wed, Nov 20, 2013 at 02:49:33PM -0500, Dan Streetman wrote:
>>>> > Currently, zswap is writeback cache; stored pages are not sent
>>>> > to swap disk, and when zswap wants to evict old pages it must
>>>> > first write them back to swap cache/disk manually. This avoids
>>>> > swap out disk I/O up front, but only moves that disk I/O to
>>>> > the writeback case (for pages that are evicted), and adds the
>>>> > overhead of having to uncompress the evicted pages, and adds the
>>>> > need for an additional free page (to store the uncompressed page)
>>>> > at a time of likely high memory pressure. Additionally, being
>>>> > writeback adds complexity to zswap by having to perform the
>>>> > writeback on page eviction.
>>>> >
>>>> > This changes zswap to writethrough cache by enabling
>>>> > frontswap_writethrough() before registering, so that any
>>>> > successful page store will also be written to swap disk. All the
>>>> > writeback code is removed since it is no longer needed, and the
>>>> > only operation during a page eviction is now to remove the entry
>>>> > from the tree and free it.
>>>>
>>>> I like it. It gets rid of a lot of nasty writeback code in zswap.
>>>>
>>>> I'll have to test before I ack, hopefully by the end of the day.
>>>>
>>>> Yes, this will increase writes to the swap device over the delayed
>>>> writeback approach. I think it is a good thing though. I think it
>>>> makes the difference between zswap and zram, both in operation and in
>>>> application, more apparent. Zram is the better choice for embedded where
>>>> write wear is a concern, and zswap being better if you need more
>>>> flexibility to dynamically manage the compressed pool.
>>>
>>> One thing I realized while doing my testing was that making zswap
>>> writethrough also impacts synchronous reclaim. Zswap, as it is now,
>>> makes the swapcache page clean during swap_writepage() which allows
>>> shrink_page_list() to immediately reclaim it. Making zswap writethrough
>>> eliminates this advantage and swapcache pages must be scanned again
>>> before they can be reclaimed, as is the case with normal swapping.
>>
>> Yep, I thought about that as well, and it is true, but only while
>> zswap is not full. With writeback, once zswap fills up, page stores
>> will frequently have to reclaim pages by writing compressed pages to
>> disk. With writethrough, the zbud reclaim should be quick, as it only
>> has to evict the pages, not write them to disk. So I think basically
>> writeback should speed up (compared to no-zswap case) swap_writepage()
>> while zswap is not full, but (theoretically) slow it down (compared to
>> no-zswap case) while zswap is full, while writethrough should slow
>> down swap_writepage() slightly (the time it takes to compress/store
>> the page) but consistently, almost the same amount before it's full vs
>> when it's full. Theoretically :-) Definitely something to think
>> about and test for.
>>
>
> Have you gotten any further benchmark result?

Yes, and sorry for the delay.

The initial numbers I got on a relatively low-end (laptop) system seem
to indicate that writethrough does reduce performance in the beginning
when zswap isn't full, but also improves performance once zswap fills
up. I'm working on getting a higher-end server class system set up to
test as well, and getting a larger sample size of test runs (but
specjbb takes quite a long time each run).

At this point, I'm thinking that based on those results and Weijie's
suggestion to make it configurable, it probably is better to keep the
writeback code and allow selection of writeback or writethrough. That
would allow *possibly* changing from writeback to writethrough based
on how full zswap is; but also at the least it would allow users to
select which to use. I still think keeping both makes zswap more
complex, but moving completely to writethrough may not be best for all
situations. I suspect that especially systems with relatively slow
disc I/O and fast processors would benefit from writeback, while
systems with relatively fast disc I/O (e.g. SSD swap) and slower
processors would benefit from writethrough.

So, any opinions on keeping both writeback and writethrough, with (for
now) a module param to select? I'll send an updated patch if that
sounds agreeable to all...

The specific specjbb numbers (on a dual core 2.4GHz with 4G ram) I got were:

writeback
heap in mb bops
3000 38887
3500 39260
4000 38113
4500 15686
5000 10978
5500 1445
6000 1827

writethrough
heap in mb bops
3000 39021
3500 35998
4000 36223
4500 7222
5000 7717
5500 2304
6000 2455
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/