Re: [PATCH v18 00/32] per memcg lru_lock

From: Daniel Jordan
Date: Thu Aug 27 2020 - 21:32:06 EST


On Wed, Aug 26, 2020 at 04:59:28PM +0800, Alex Shi wrote:
> I clean up my testing and make it reproducable by a Dockerfile and a case patch which
> attached.

Ok, I'll give that a shot once I've taken care of sysbench.

> >>> Even better would be a description of the problem you're having in production
> >>> with lru_lock. We might be able to create at least a simulation of it to show
> >>> what the expected improvement of your real workload is.
> >>
> >> we are using thousands memcgs in a machine, but as a simulation, I guess above case
> >> could be helpful to show the problem.
> >
> > Using thousands of memcgs to do what? Any particulars about the type of
> > workload? Surely it's more complicated than page cache reads :)
>
> Yes, the workload are quit different on different business, some use cpu a
> lot, some use memory a lot, and some are may mixed.

That's pretty vague, but I don't suppose I could do much better describing what
all runs on our systems :-/

I went back to your v1 post to see what motivated you originally, and you had
some results from aim9 but nothing about where this reared its head in the
first place. How did you discover the bottleneck? I'm just curious about how
lru_lock hurts in practice.

> > Neither kernel compile nor git checkout in the root cgroup changed much, just
> > 0.31% slower on elapsed time for the compile, so no significant regressions
> > there. Now for sysbench again.

Still working on getting repeatable sysbench runs, no luck so far. The numbers
have stayed fairly consistent with your series but vary a lot on the base
kernel, not sure why yet.