Re: [PATCH] memcg: Replace mm->owner with mm->memcg

From: Eric W. Biederman
Date: Mon May 07 2018 - 23:15:43 EST


Oleg Nesterov <oleg@xxxxxxxxxx> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@xxxxxxxxxx> writes:
>>
>> > On 05/04, Eric W. Biederman wrote:
>> >>
>> >> Oleg Nesterov <oleg@xxxxxxxxxx> writes:
>> >>
>> >> > I'd vote for the change in exec_mmap(). This way mm_init_memcg() can just
>> >> > nullify mm->memcg.
>> >>
>> >> There is at least one common path where we need the memory control group
>> >> properly initialized so memory allocations don't escape the memory
>> >> control group.
>> >>
>> >> do_execveat_common
>> >> copy_strings
>> >> get_arg_page
>> >> get_user_pages_remote
>> >> __get_user_pages_locked
>> >> __get_user_pages
>> >> faultin_page
>> >> handle_mm_fault
>> >> count_memcg_event_mm
>> >> __handle_mm_fault
>> >> handle_pte_fault
>> >> do_anonymous_page
>> >> mem_cgroup_try_charge
>
> Ah yes, indeed.
>
>> mm_init_memcg is at the same point as mm_init_owner. So my change did
>> not introduce any logic changes on when the memory control group became
>> valid.
>
> Not sure, but perhaps I am all confused....

So. In exec actions that are initiated while a process is calling exec
can either logcially happen either before or after the exec. That is
the way these races are always resolved. I don't see any reason
for the memory control group to be any different.

Which means it is perfectly valid for a migration that technically
happens in the middle of copy_strings to not show up until some time
later in exec.

This works because there are no user visible effects until exec
completes. If there are user visible effects they are bugs.

That is all it takes to understand why my patch fixes things.

It might make more sense to simply fail migration if task->in_execve.
While perhaps the best solution that might introduce a user visible
effect that we don't care about right now.

> before your patch get_mem_cgroup_from_mm() looks at mm->owner == current
> (in this case) and mem_cgroup_from_task() should return the correct memcg
> even if execing task migrates after bprm_mm_init(). At least in the common
> case when the old mm is not shared.
>
> After your patch the memory allocations in copy_strings() won't be accounted
> correctly, bprm->mm->memcg is wrong if this task migrates. And iiuc your recent
> "[PATCH 2/2] memcg: Close the race between migration and installing bprm->mm as mm"
> doesn't fix the problem.
>
> No?

The patch does solve the issue. There should be nothing a userspace
process can observe that should tell it where in the middle of exec
such a migration happend so placing the migration at what from the
kernel's perspective might be technically later should not be a problem.

If it is a problem the issue is that there is a way to observe the
difference.

> Perhaps we can change get_mem_cgroup_from_mm() to use
> mem_cgroup_from_css(current, memory_cgrp_id) if mm->memcg == NULL?

Please God no. Having any unnecessary special case is just going to
confuse people and cause bugs.

Plus as I have pointed out above that is not an issue.

Eric