RE: [PATCHv11 3/4] zswap: add to mm/

From: Dan Magenheimer
Date: Tue May 14 2013 - 16:57:28 EST


> From: Seth Jennings [mailto:sjenning@xxxxxxxxxxxxxxxxxx]
> Subject: Re: [PATCHv11 3/4] zswap: add to mm/
>
> On Tue, May 14, 2013 at 09:37:08AM -0700, Dan Magenheimer wrote:
> > > From: Seth Jennings [mailto:sjenning@xxxxxxxxxxxxxxxxxx]
> > > Subject: Re: [PATCHv11 3/4] zswap: add to mm/
> > >
> > > On Tue, May 14, 2013 at 05:19:19PM +0800, Bob Liu wrote:
> > > > Hi Seth,
> > >
> > > Hi Bob, thanks for the review!
> > >
> > > >
> > > > > + /* reclaim space if needed */
> > > > > + if (zswap_is_full()) {
> > > > > + zswap_pool_limit_hit++;
> > > > > + if (zbud_reclaim_page(tree->pool, 8)) {
> > > >
> > > > My idea is to wake up a kernel thread here to do the reclaim.
> > > > Once zswap is full(20% percent of total mem currently), the kernel
> > > > thread should reclaim pages from it. Not only reclaim one page, it
> > > > should depend on the current memory pressure.
> > > > And then the API in zbud may like this:
> > > > zbud_reclaim_page(pool, nr_pages_to_reclaim, nr_retry);
> > >
> > > So kswapd for zswap. I'm not opposed to the idea if a case can be
> > > made for the complexity. I must say, I don't see that case though.
> > >
> > > The policy can evolve as deficiencies are demonstrated and solutions are
> > > found.
> >
> > Hmmm... it is fairly easy to demonstrate the deficiency if
> > one tries. I actually first saw it occur on a real (though
> > early) EL6 system which started some graphics-related service
> > that caused a very brief swapstorm that was invisible during
> > normal boot but clogged up RAM with compressed pages which
> > later caused reduced weird benchmarking performance.
>
> Without any specifics, I'm not sure what I can do with this.

Well, I think its customary for the author of a patch to know
the limitations of the patch. I suggest you synthesize a
workload that attempts to measure worst case. That's exactly
what I did a year ago that led me to the realization that
zcache needed to solve some issues before it was ready to
promote out of staging.

> I'm hearing you say that the source of the benchmark degradation
> are the idle pages in zswap. In that case, the periodic writeback
> patches I have in the wings should address this.
>
> I think we are on the same page without realizing it. Right now
> zswap supports a kind of "direct reclaim" model at allocation time.
> The periodic writeback patches will handle the proactive writeback
> part to free up the zswap pool when it has idle pages in it.

I don't think we are on the same page though maybe you are heading
in the same direction now. I won't repeat the comments from the
previous email.

> > I think Mel's unpredictability concern applies equally here...
> > this may be a "long-term source of bugs and strange memory
> > management behavior."
> >
> > > Can I get your ack on this pending the other changes?
> >
> > I'd like to hear Mel's feedback about this, but perhaps
> > a compromise to allow for zswap merging would be to add
> > something like the following to zswap's Kconfig comment:
> >
> > "Zswap reclaim policy is still primitive. Until it improves,
> > zswap should be considered experimental and is not recommended
> > for production use."
>
> Just for the record, an "experimental" tag in the Kconfig won't
> work for me.
>
> The reclaim policy for zswap is not primitive, it's simple. There
> is a difference. Plus zswap is already runtime disabled by default.
> If distros/customers enabled it, it is because they purposely
> enabled it.

Hmmm... I think you are proposing to users/distros the following
use model: "If zswap works for you, turn it on. If it sucks,
turn it off. I can't tell you in advance whether it will work
or suck for your distro/workload, but it will probably work so
please try it."

That sounds awfully experimental to me.

The problem is not simple. Your solution is simple because
you are simply pretending that the harder parts of the problem
don't exist.

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/