Re: [PATCH] namespaces: add transparent user namespaces

From: Jann Horn
Date: Thu Jun 23 2016 - 16:02:33 EST


On Thu, Jun 23, 2016 at 8:50 PM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> Jann Horn <jannh@xxxxxxxxxx> writes:
>
>> This allows the admin of a user namespace to mark the namespace as
>> transparent. All other namespaces, by default, are opaque.
>>
>> While the current behavior of user namespaces is appropriate for use in
>> containers, there are many programs that only use user namespaces because
>> doing so enables them to do other things (e.g. unsharing the mount or
>> network namespace) that require namespaced capabilities. For them, the
>> inability to see the real UIDs and GIDs of things from inside the user
>> namespace can be very annoying.
>>
>> In a transparent namespace, all UIDs and GIDs that are mapped into its
>> first opaque ancestor are visible and are not remapped. This means that if
>> a process e.g. stat()s the real root directory in a namespace, it will
>> still see it as owned by UID 0.
>>
>> Traditionally, any UID or GID that was visible in a user namespace was also
>> mapped into the namespace, giving the namespace admin full access to it.
>> This patch introduces a distinction: In a transparent namespace, UIDs and
>> GIDs can be visible without being mapped. Non-mapped, visible UIDs can be
>> passed from the kernel to userspace, but userspace can't send them back to
>> the kernel. In order to be able to fully use specific UIDs/GIDs and gain
>> privileges over them, mappings need to be set up in the usual way -
>> however, to avoid aliasing problems, only identity mappings are permitted.
>>
>> I have gone through all callers of from_kuid() and from_kgid(), and as far
>> as I can tell, kuid_has_mapping() and kgid_has_mapping() were the only
>> functions that used them for privilege checks. (The keys subsystem uses
>> them in an insecure way, and that issue has been known for a while, but my
>> patch doesn't make that any more vulnerable than it already is.
>
> Perhaps it has been known for a while but no one has stopped and
> mentioned it to me. What questionable thing is the keys subsystem
> doing?

Not just questionable, completely wrong. The gist is that there is a
*global* name -> key mapping for accessing keys by name, and user
keyrings are stored in there under the name "_uid.%u", where %u
refers to the *namespaced* UID. (See install_user_keyrings().)
The result is that, if e.g. the user with UID 1000 has no running
processes, a local attacker can enter a new user namespace, map UID
1000 in the namespace to some KUID he controls, do
setresuid(1000, 1000, 1000), and now he owns user 1000's keyring.
This ends up permitting the attacker to dump the contents of KUID
1000's keys after KUID 1000 signs in. I discovered this while going
through the kuid->uid conversions when I thought about writing this
feature the first time.

(I think you have security list access, right? If so, you can see my PoC
and the discussion in the "namespace handling in security/keys is
broken" thread from 5. January 2016.)

> This is a bigish change in semantics and I am going to have to digest
> this before I can give this an ok.
>
> Quite frankly at the base it scares me.
>
> If this is just about presentation and allowing some information from
> the parent user namespace

Yes, that's exactly my intent.

> I would be much happier if it was not
> from_kuid but that you modified, but if you instead you had a function
> say from_kuid_transparent, that performed the transformation you need
> and was only used in those places it is safe.
>
> I think I could reason about that.
>
> As your patchset sits I can not reason about the change in semantics,
> because without a large grep of the source I don't know what you are
> changing.
>
> And you are dramatically changing the semantics the semantics of
> from_kuid to the point I do believe we need to inspepect all of the call
> sites. As such I really don't think it makes sense to reuse the
> existing name for your new semantics.

Sure, that makes sense. I'll make a v2 with from_*uid_transparent() or
so.