Re: [rfc] fcntl: Add F_GETOWNER_UIDS option
From: Serge Hallyn
Date: Fri Mar 30 2012 - 10:12:31 EST
Quoting Cyrill Gorcunov (gorcunov@xxxxxxxxxx):
> On Thu, Mar 29, 2012 at 02:30:53AM +0000, Serge E. Hallyn wrote:
> > Quoting Cyrill Gorcunov (gorcunov@xxxxxxxxxx):
> > > On Wed, Mar 28, 2012 at 04:30:44PM -0500, Serge Hallyn wrote:
> > > > Quoting Oleg Nesterov (oleg@xxxxxxxxxx):
> > > > > On 03/28, Serge E. Hallyn wrote:
> > > > > >
> > > > > > If you want to
> > > > > > just add the struct cred to the f_owner and do proper uid conversion,
> > > > > > I'll support that too. (Just grab a ref to the cred in
> > > > > > fs/fcntl.c:f_modown(), and drop the ref in fs/file_table.c:__fput() ).
> > > > >
> > > > > In this case f_owner.*uid should go away, I guess.
> > > >
> > > > Yup.
> > > >
> > > > Which I guess is all the more reason *not* to do this unless we end up
> > > > not going with Eric's userns mapping patchset (which is unlikely).
> > > >
> > > > > And sigio_perm()
> > > > > should be unified with kill_ok_by_cred() somehow (modulo
> > > > > security_file_send_sigiotask).
> > > > >
> > > > > Right?
> > > >
> > > > Maybe, but other differences include current being the signal sender in
> > > > one and recipient in the other, and CAP_KILL being relevent in only
> > > > one.
> > >
> > > Hi Serge, thanks a lot for comments! Replying to prev email --
> > > I've skipped cred part intentionally, I guess we need to wait
> > > until Eric's patches hit LKML (if I understand all right) then
> > > I'll expand the patch. I'll think a bit more tomorrow, ok?
> > Sure.
> > Thinking about it, the cred being stored right now is the cred in the
> > container. That's what you want for checkpoint, right? So if someone
> Hi Serge, sorry for delay, the stored creds are the ones a task has
> at checkpoint time (we parse /proc/pid/status), and the dumper/restorer
> works with root privileges so they should be able to change creds to
> the former values on restore procedure.
> > with the privs to do it checkpoints a task in a child userns, and restarts
> > that without doing so in a child user ns, he should be allowed to do so.
> I think so. Basically we require both checkpointer and restorer
> to have admin rights before they do c/r (it might be relaxed in
> future probably) and actually I think we're more oriented to
> achieve stable c/r from init-namespace first (once this accomplished
> then c/r from inside nested namespaces could be considered).
> > So what I'm saying is that it's not in-defensible to just not change
> > anything in your original patch until we can discuss Eric's set.
> Yes, I wanna take a look on Eric's set first just to get right
> "picture" of everything. And I wanted to find a minimal solution
> with current kernel code base which could be extended in future.
> That said I guess the current init-ns-only approach should do the
> trick for a while. And (thanks for pointing) I need to add a test
> if a caller which tries to obtain uids has enought credentials
> for that (probably CAP_FOWNER), right?
Sorry, I'm not sure which caller you mean. Neither f_setown nor
f_getown require privilege right now. Oh, you mean at restart?
f_setown to someone else's uid/pid means you may cause a signal to
be sent to them. So CAP_KILL might be good? You do through that
signal get *some* info about the file writes, though not contents.
So yeah, maybe (CAP_KILL|CAP_FOWNER).
> > If we were to *not* go with Eric's set, then when using your proposed
> > patch for debugging purposes, would we want to show a list of uids,
> > starting with the uid in the reader's user namespace, up to the
> > container being investigated? So for instance if init_user_ns spawned
> > userns1, and that spawned userns2, and root in userns1 is seeking this
> > info for a f_owner in userns2, then he should see two userids, the one
> > mapped into usern1, and the one in userns2.
> > In Eric's set, we may want to show only the kuid (since the mapped
> > userid can be found other ways), or for convenience we may want to show
> > both the kuid and the mapped uid.
> I suspect operating with kuid's will be a way more easier.
Yeah, I keep going back and forth on which makes more sense. But
kuid's probably make more sense, even though they aren't what
userspace in container will see. When you restore, the mapping
will give userspace what it expects; and if you're going to
restart in a container with a different mapping, then you'll
have to convert the filesystem as well since its inodes will
store kuids, so may as well also convert the kuids in the
checkpoint image then.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/