Re: [LSF TOPIC] beyond uidmapping, & towards a better security model

From: Stéphane Graber
Date: Tue Feb 20 2024 - 19:56:58 EST


Hey there,

Sorry, I don't have the time to go through all the details in this
post to provide an adequate response, I'm adding Aleksandr who may be
able to provide more details on what we've been up to (what James
alluded to).

Our proposal is effectively bumping the in-kernel kuid_t/kgid_t from
uint32 to uint64, which allows for individual user namespaces to get a
full usable uint32 uid/gid range in the kernel. Obviously any kind of
data persistence needs some mapping (VFS idmap) and there are a bunch
of other corner cases as to how this is all exposed to userspace.

The idea around this stuff started back at Plumbers / Kernel summit
all the way back in 2019 with a bit of refinement on the idea on and
off ever since.
We now have a functional patchset and example userspace code at:
- https://github.com/mihalicyn/isolated-userns
- https://github.com/mihalicyn/linux/commits/isolated_userns

If you don't mind watching a video, we have a reasonably detailed talk
on the topic as well as demo and useful audience questions and
feedback from FOSDEM here: https://www.youtube.com/watch?v=mOLzSzpVwHU

After talking about this with folks at a number of LPC / kernel summit
/ FOSDEM by this point, our next step is going to be an RFC patchset,
I think at this point we just want the cgroupfs issue sorted out
before sending that out.

I'll try to set some time to go through your full e-mail later this
week if Alex doesn't get to it first!

Stéphane

On Tue, Feb 20, 2024 at 7:26 PM Kent Overstreet
<kent.overstreet@xxxxxxxxx> wrote:
>
> On Mon, Feb 19, 2024 at 09:26:25AM -0500, James Bottomley wrote:
> > On Sat, 2024-02-17 at 15:56 -0500, Kent Overstreet wrote:
> > > AKA - integer identifiers considered harmful
> > >
> > > Any time you've got a namespace that's just integers, if you ever end
> > > up needing to subdivide it you're going to have a bad time.
> > >
> > > This comes up all over the place - for another example, consider
> > > ioctl numbering, where keeping them organized and collision free is a
> > > major headache.
> > >
> > > For UIDs, we need to be able to subdivide the UID namespace for e.g.
> > > containers and mounting filesystems as an unprivileged user - but
> > > since we just have an integer identifier, this requires complicated
> > > remapping and updating and maintaining a global table.
> > >
> > > Subdividing a UID to create new permissions domains should be a
> > > cheap, easy operation, and it's not.
> > >
> > > The solution (originally from plan9, of course) is - UIDs shouldn't
> > > be numbers, they should be strings; and additionally, the strings
> > > should be paths.
> > >
> > > Then, if 'alice' is a user, 'alice.foo' and 'alice.bar' would be
> > > subusers, created by alice without any privileged operations or
> > > mucking with outside system state, and 'alice' would be superuser
> > > w.r.t. 'alice.foo' and 'alice.bar'.
> > >
> > > What's this get us?
> >
> > I would have to say that changing kuid for a string doesn't really buy
> > us anything except a load of complexity for no very real gain.
> > However, since the current kuid is u32 and exposed uid is u16 and there
> > is already a proposal to make use of this somewhat in the way you
> > envision,
>
> Got a link to that proposal?
>
> > there might be a possibility to re-express kuid as an array
> > of u16s without much disruption. Each adjacent pair could represent
> > the owner at the top and the userns assigned uid underneath. That
> > would neatly solve the nesting problem the current upper 16 bits
> > proposal has.
>
> At a high level, there's no real difference between a variable length
> integer, or a variable length array of integers, or a string.
>
> But there's real advantages to getting rid of the string <-> integer
> identifier mapping and plumbing strings all the way through:
>
> - creating a new sub-user can be done with nothing more than the new
> username version of setuid(); IOW, we can start a new named subuser
> for e.g. firefox without mucking with _any_ system state or tables
>
> - sharing filesystems between machines is always a pita because
> usernames might be the same but uids never are - let's kill that off,
> please
>
> Doing anything as big as an array of integers is going to be a major
> compatibiltiy break anyways, so we might as well do it right.
>
> Either way we're going to need a mapping to 16 bit uids for
> compatibility; doing this right gives userspace an incentive to get
> _off_ that compatibility layer so we're not dealing with that impedence
> mismatch forever.
>
> > However, neither proposal would get us out of the problem of mount
> > mapping because we'd have to keep the filesystem permission check on
> > the owning uid unless told otherwise.
>
> Not sure I follow?
>
> We're always going to need mount mapping, but if the mount mapping is
> just "usernames here get mapped to this subtree of the system username
> namespace", then that potentially simplifies things quite a bit - the
> mount mapping is no longer a _table_.
>
> And it wouldn't have to be administrator assigned. Some administrator
> assignment might be required for the username <-> 16 bit uid mapping,
> but if those mappings are ephemeral (i.e. if we get filesystems
> persistently storing usernames, which is easy enough with xattrs) then
> that just becomes "reserve x range of the 16 bit uid space for ephemeral
> translations".



--
Stéphane