Re: [PATCH] userns: Default to 'yes' when CONFIG_MEMCG option is enabled

From: Javier Martinez Canillas
Date: Fri May 24 2024 - 08:33:40 EST


Christian Brauner <brauner@xxxxxxxxxx> writes:

Hello Christian,

Thanks a lot for your feedback.

> On Fri, May 24, 2024 at 10:24:16AM +0200, Javier Martinez Canillas wrote:
>> The default value for the CONFIG_USER_NS Kconfig symbol changed over time.
>>
>> When first was introduced by commit acce292c82d4 ("user namespace: add the
>> framework"), the default was 'no'. But then it was changed to 'yes' if the
>> CONFIG_NAMESPACES option was enabled, by commit 17a6d4411a4d ("namespaces:
>> default all the namespaces to 'yes' when CONFIG_NAMESPACES is selected").
>>
>> Then, commit 5673a94c1457 ("userns: Add a Kconfig option to enforce strict
>> kuid and kgid type checks") changed the default to 'no' again and selected
>> the (now defunct) UIDGID_STRICT_TYPE_CHECKS option.
>>
>> This selected option was removed by commit 261000a56b63 ("userns: Remove
>> UIDGID_STRICT_TYPE_CHECKS"), but CONFIG_USER_NS default was left to 'no'.
>>
>> Finally, the commit e11f0ae388f2 ("userns: Recommend use of memory control
>> groups") added to the Kconfig symbol's help text a recommendation that the
>> memory control groups should be used, to limit the amount of memory that a
>> user who can create user namespaces can consume.
>>
>> Looking at the changes' history, a default to 'yes' when the CONFIG_MEMCG
>> option is enabled seems like a sane thing to do. Specially since systemd
>> requires user namespaces support for services that use the PrivateUsers=
>> property in their unit files (e.g: the UPower daemon).
>
> Fyi, user namespaces are an entirely optional feature in systemd and it
> gracefully falls back if they are not available with PrivateUsers= set.
> If that isn't the case then it's a bug in systemd with PrivateUsers=
> handling and should be reported.
>

Interesting, it definitely failed for me:

$ systemctl status upower
● upower.service - Daemon for power management
Loaded: loaded (/lib/systemd/system/upower.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2024-05-24 12:23:49 UTC; 34s ago
Docs: man:upowerd(8)
Process: 390 ExecStart=/usr/libexec/upowerd (code=exited, status=217/USER)
Main PID: 390 (code=exited, status=217/USER)
CPU: 122ms

May 24 12:23:49 igep systemd[1]: upower.service: Scheduled restart job, restart counter is at 5.
May 24 12:23:49 igep systemd[1]: Stopped Daemon for power management.
May 24 12:23:49 igep systemd[1]: upower.service: Start request repeated too quickly.
May 24 12:23:49 igep systemd[1]: upower.service: Failed with result 'exit-code'.
May 24 12:23:49 igep systemd[1]: Failed to start Daemon for power management.

$ journalctl -u upower
May 24 12:23:49 igep systemd[1]: Starting Daemon for power management...
May 24 12:23:49 igep systemd[404]: upower.service: Failed to set up user namespacing: Invalid argument
May 24 12:23:49 igep systemd[404]: upower.service: Failed at step USER spawning /usr/libexec/upowerd: Invalid argument
May 24 12:23:49 igep systemd[1]: upower.service: Main process exited, code=exited, status=217/USER
May 24 12:23:49 igep systemd[1]: upower.service: Failed with result 'exit-code'.
May 24 12:23:49 igep systemd[1]: Failed to start Daemon for power management.
May 24 12:23:49 igep systemd[1]: upower.service: Scheduled restart job, restart counter is at 1.
May 24 12:23:49 igep systemd[1]: Stopped Daemon for power management.

That lead me to https://gitlab.freedesktop.org/upower/upower/-/issues/104
and finally to systemd's README:

https://github.com/systemd/systemd/blob/main/README#L89C22-L89C34

But I'll investigate more if is upower or systemd to be blamed here...

> But specifically to you change, afair CONFIG_MEMCG and userns are
> unrelated so tying them together like this in the kconfig seems
> misguided.
>

Yes, but the config USER_NS help text already tieds them toghether:

help
This allows containers, i.e. vservers, to use user namespaces
to provide different user info for different servers.

When user namespaces are enabled in the kernel it is
recommended that the MEMCG option also be enabled and that
user-space use the memory control groups to limit the amount
of memory a memory unprivileged users can use.

And as mentioned in the commit message, it seems to be the reason why the
default for this Kconfig symbol is no. Maybe I misunderstood though or do
you think that could be switched unconditionally to 'default y' ?

Or is there a reason to be the only namespace to be default to no instead
of yes? Specially since important system services are trying to use it.

--
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat