RE: [PATCHv5 1/8] zsmalloc: add to mm/

From: Dan Magenheimer
Date: Mon Feb 25 2013 - 19:23:57 EST

Next message: tip-bot for Matt Fleming: "[tip:x86/urgent] x86, efi: Mark disable_runtime as __initdata"
Previous message: Lai Jiangshan: "Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design ofPer-CPU Reader-Writer Locks"
In reply to: Seth Jennings: "Re: [PATCHv5 1/8] zsmalloc: add to mm/"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> From: Seth Jennings [mailto:sjenning@xxxxxxxxxxxxxxxxxx]
> Subject: Re: [PATCHv5 1/8] zsmalloc: add to mm/
>
> On 02/25/2013 11:05 AM, Dan Magenheimer wrote:
> >> From: Seth Jennings [mailto:sjenning@xxxxxxxxxxxxxxxxxx]
> >> Sent: Friday, February 22, 2013 1:04 PM
> >> To: Joonsoo Kim
> >> Subject: Re: [PATCHv5 1/8] zsmalloc: add to mm/
> >>
> >> On 02/22/2013 03:24 AM, Joonsoo Kim wrote:
> >>>
> >>> It's my quick thought. So there is no concrete idea.
> >>> As Seth said, with a FULL list, zsmalloc always access all zspage.
> >>> So, if we want to know what pages are for zsmalloc, we can know it.
> >>> The EMPTY list can be used for pool of zsmalloc itself. With it, we don't
> >>> need to free zspage directly, we can keep zspages, so can reduce
> >>> alloc/free overhead. But, I'm not sure whether it is useful.
> >>
> >> I think it's a good idea. zswap actually does this "keeping some free
> >> pages around for later allocations" outside zsmalloc in a mempool that
> >> zswap manages. Minchan once mentioned bringing that inside zsmalloc
> >> and this would be a way we could do it.
> >
> > I think it's a very bad idea. If I understand, the suggestion will
> > hide away some quantity (possibly a very large quantity) of pages
> > for the sole purpose of zswap, in case zswap gets around to using them
> > sometime in the future. In the meantime, those pages are not available
> > for use by any other kernel subsystems or by userland processes.
> > An idle page is a wasted page.
> >
> > While you might defend the mempool use for a handful of pages,
> > frontswap writes/reads thousands of pages in a bursty way,
> > and then can go idle for a very long time. This may not be
> > readily apparent with artificially-created memory pressure
> > from kernbench with -jN (high N). Leaving thousands
> > of pages in zswap's personal free list may cause memory pressure
> > that would otherwise never have existed.
>
> I experimentally determined that this pool increased allocation
> success rate and, therefore, reduced the number of pages going to the
> swap device.

Of course it does. But you can't experimentally determine how
pre-allocating pages will impact overall performance, especially
over a wide range of workloads that may or may not even swap.

> The zswap mempool has a target size of 256 pages. This places an
> upper bound on the number of pages held in reserve for zswap. So we
> aren't talking about "thousands of pages".

I used "thousands of pages" in reference to Joonsoo's idea about
releasing pages to the EMPTY list, not about mempool. I wasn't
aware that mempool has a target size of 256 pages, but even
that smaller amount seems wrong to me, especially if the
mempool user (zswap) will blithely use many thousands of pages
in a burst.

> And yes, the pool does remove up to 1MB of memory (on a 4k PAGE_SIZE)
> from general use, which causes the reclaim to start very slightly earlier.
>
> >
> >> Just want to be clear that I'd be in favor of looking at this after
> >> the merge.
> >
> > I disagree... I think this is exactly the kind of fundamental
> > MM interaction that should be well understood and resolved
> > BEFORE anything gets merged.
>
> While there is discussion to be had here, I don't agree that it's
> "fundamental" and should not block merging.
>
> The mempool does serve a purpose and adds measurable benefit. However,
> if it is determined at some future time that having a reserved pool of
> any size in zswap is bad practice, it can be removed trivially.

We had this argument last summer... Sure, _any_ cache can be shown
to add some measurable benefit under the right circumstances.
Any cacheing solution also has some often subtle side effects
on some workloads, that can have more costs than benefits.
Our cache (zcache and/or zswap and/or zram) is not just a cache
but it is also stealing capacity from its backing store (total
kernel RAM), so is even more likely to have side effects.

While I firmly believe -- intuitively -- that compression should
eventually be a feature of the MM subsystem, I don't think we
have enough understanding yet of the interactions or the
policies needed to control those interactions or the impact
of those policies on the underlying zram/zcache/zswap design choices.
For example, zswap completely ignores the pagecache... is that
a good thing or a bad thing, and why?

Seth, I'm not trying to piss you off, but last summer you seemed
hell-bent on promoting zcache out of staging, and it didn't
happen so you hacked off the part of zcache you were most interested
in, forked it and rewrote it, and are now trying to merge that
fork into MM ASAP. Why? Why can't we work together on solving
the hard problems instead of infighting and reimplementing the wheel?

Dan "shakes head"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: tip-bot for Matt Fleming: "[tip:x86/urgent] x86, efi: Mark disable_runtime as __initdata"
Previous message: Lai Jiangshan: "Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design ofPer-CPU Reader-Writer Locks"
In reply to: Seth Jennings: "Re: [PATCHv5 1/8] zsmalloc: add to mm/"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]