Re: [PATCH v18 00/32] per memcg lru_lock

From: Daniel Jordan
Date: Mon Aug 24 2020 - 21:48:13 EST

Next message: Al Viro: "Re: [PATCH 0/5] bio: Direct IO: convert to pin_user_pages_fast()"
Previous message: Frank Rowand: "Re: [RFC PATCH v2 0/3] mikroBUS driver for add-on boards"
In reply to: Hugh Dickins: "Re: [PATCH v18 00/32] per memcg lru_lock"
Next in thread: Alex Shi: "Re: [PATCH v18 00/32] per memcg lru_lock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Aug 24, 2020 at 01:24:20PM -0700, Hugh Dickins wrote:
> On Mon, 24 Aug 2020, Andrew Morton wrote:
> > On Mon, 24 Aug 2020 20:54:33 +0800 Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx> wrote:
> Andrew demurred on version 17 for lack of review. Alexander Duyck has
> been doing a lot on that front since then. I have intended to do so,
> but it's a mirage that moves away from me as I move towards it: I have

Same, I haven't been able to keep up with the versions or the recent review
feedback. I got through about half of v17 last week and hope to have more time
for the rest this week and beyond.

> > > Following Daniel Jordan's suggestion, I have run 208 'dd' with on 104
> > > containers on a 2s * 26cores * HT box with a modefied case:

Alex, do you have a pointer to the modified readtwice case?

Even better would be a description of the problem you're having in production
with lru_lock. We might be able to create at least a simulation of it to show
what the expected improvement of your real workload is.

> > > https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice
> > > With this patchset, the readtwice performance increased about 80%
> > > in concurrent containers.
> >
> > That's rather a slight amount of performance testing for a huge
> > performance patchset!
>
> Indeed. And I see that clause about readtwice performance increased 80%
> going back eight months to v6: a lot of fundamental bugs have been fixed
> in it since then, so I do think it needs refreshing. It could be faster
> now: v16 or v17 fixed the last bug I knew of, which had been slowing
> down reclaim considerably.
>
> When I last timed my repetitive swapping loads (not loads anyone sensible
> would be running with), across only two memcgs, Alex's patchset was
> slightly faster than without: it really did make a difference. But
> I tend to think that for all patchsets, there exists at least one
> test that shows it faster, and another that shows it slower.
>
> > Is more detailed testing planned?
>
> Not by me, performance testing is not something I trust myself with,
> just get lost in the numbers: Alex, this is what we hoped for months
> ago, please make a more convincing case, I hope Daniel and others
> can make more suggestions. But my own evidence suggests it's good.

I ran a few benchmarks on v17 last week (sysbench oltp readonly, kerndevel from
mmtests, a memcg-ized version of the readtwice case I cooked up) and then today
discovered there's a chance I wasn't running the right kernels, so I'm redoing
them on v18. Plan to look into what other, more "macro" tests would be
sensitive to these changes.

Next message: Al Viro: "Re: [PATCH 0/5] bio: Direct IO: convert to pin_user_pages_fast()"
Previous message: Frank Rowand: "Re: [RFC PATCH v2 0/3] mikroBUS driver for add-on boards"
In reply to: Hugh Dickins: "Re: [PATCH v18 00/32] per memcg lru_lock"
Next in thread: Alex Shi: "Re: [PATCH v18 00/32] per memcg lru_lock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]