Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

From: Eric W. Biederman
Date: Fri Apr 27 2018 - 14:05:51 EST


Michal Hocko <mhocko@xxxxxxxxxx> writes:

> On Thu 26-04-18 21:28:18, Michal Hocko wrote:
>> On Thu 26-04-18 11:19:33, Eric W. Biederman wrote:
>> > Michal Hocko <mhocko@xxxxxxxxxx> writes:
>> >
>> > > I've had a patch to remove owner few years back. It needed some work
>> > > to finish but maybe that would be a better than try to make
>> > > non-scalable thing suck less.
>> >
>> > I have a question. Would it be reasonable to just have a mm->memcg?
>> > That would appear to be the simplest solution to the problem.
>>
>> I do not remember details. Have to re-read the whole thing again. Hope
>> to get to this soon but with the current jet lag and backlog from the
>> LSFMM I rather not promis anything. Going with mm->memcg would be the
>> most simple of course but I have a very vague recollection that it was
>> not possible. Maybe I misremember...
>
> Just for the record, the last version where I've tried to remove owner
> was posted here: http://lkml.kernel.org/r/1436358472-29137-1-git-send-email-mhocko@xxxxxxxxxx
>
> I didn't get to remember details yet, but the primary problem was the
> task migration between cgroups and the nasty case when different thread
> grounds share the mm. At some point I just suggested to not care
> about semantic of these weird threads all that much. We can either
> migrate all tasks sharing the mm struct or just keep the inconsistency.
>
> Anyway, removing this ugliness would be so cool!

I suspect the only common user of CLONE_VM today is vfork. And I do
think it is crazy to migrate a process that has called vfork before
calling exec. Other useses of CLONE_VM seem even crazier.

I think the easiest change to make in mem_cgroup_can_attach would
be just to change the test for when charges are migrated. AKA

from:

if (mm->owner == p) {
....
}

to
if (mem_cgroup_from_task(p) == mm->memcg) {
...
}

That allows using mm->memcg with no new limitations on when migration
can be called. In crazy cases that has the potential to change which
memcgroup the charges are accounted to, but the choice is already
somewhat arbitrary so I don't think that will be a problem. Especially
given that mm_update_next_owner does not migrate charges if the next
owner is in a different memory cgroup. A mm with tasks using it in
two different cgroups is already questionable if not outright
problematic.


Kirill Tkhai do you think you would be able adapt Michal Hoko's old
patch at https://marc.info/?l=linux-kernel&m=143635857131756&w=2
that replaces mm->owner with mm->memcg?

We probably want to outlaw migrating an mm where we are not migrating
all of the mm->users eventually. Just because that case is crazy.
But it doesn't look like we need to do that to fix the memory control
group data structures.

Eric