Re: [PATCH v3 1/2] fork: extend clone3() to support CLONE_SET_TID

From: Christian Brauner
Date: Wed Aug 07 2019 - 14:20:39 EST


On Wed, Aug 07, 2019 at 05:48:29PM +0200, Oleg Nesterov wrote:
> On 08/06, Adrian Reber wrote:
> >
> > @@ -2530,12 +2530,14 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
> > struct clone_args __user *uargs,
> > size_t size)
> > {
> > + struct pid_namespace *pid_ns = task_active_pid_ns(current);
> > struct clone_args args;
> >
> > if (unlikely(size > PAGE_SIZE))
> > return -E2BIG;
> >
> > - if (unlikely(size < sizeof(struct clone_args)))
> > + /* The struct needs to be at least the size of the original struct. */
> > + if (size < (sizeof(struct clone_args) - sizeof(__aligned_u64)))
> > return -EINVAL;
>
> slightly off-topic, but with or without this patch I do not understand
> -EINVAL. Can't we replace this check with
>
> if (size < sizeof(struct clone_args))
> memset((void*)&args + size, sizeof(struct clone_args) - size, 0);
>
> ?
>
> this way we can new members at the end of clone_args and this matches
> the "if (size > sizeof(struct clone_args))" block below which promises
> that whatever we add into clone_args a zero value should work.

Hm, I actually think we should define:

#define CLONE3_ARGS_SIZE_V0 64
#define CLONE3_ARGS_SIZE_V1 ...
and then later on for future extensions
#define CLONE3_ARGS_SIZE_V2 ...

then do
if (size < CLONE3_ARGS_SIZE_V0)
return -EINVAL;

then do what you suggested:

if (size < sizeof(struct clone_args))
memset((void*)&args + size, sizeof(struct clone_args) - size, 0);

>
>
> And if we do this
>
> > + if (size == sizeof(struct clone_args)) {
> > + /* Only check permissions if set_tid is actually set. */
> > + if (args.set_tid &&
> > + !ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN))
> > + return -EPERM;
> > + kargs->set_tid = args.set_tid;
> > + }
>
> we can move this check into clone3_args_valid() or even copy_process()
>
> if (kargs->set_tid) {
> if (!ns_capable(...))
> return -EPERM;
> }
>
>
> Either way,
>
> > @@ -2585,6 +2595,10 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs)
> > if (kargs->flags & ~CLONE_LEGACY_FLAGS)
> > return false;
> >
> > + /* Fail if set_tid is invalid */
> > + if (kargs->set_tid < 0)
> > + return false;
>
> I think it would be more clean to do this along with ns_capable() check,
> or along with the "set_tid >= pid_max" check in alloc_pid().
>
> I won't insist, but I do not really like the fact we check set_tid 3 times
> in copy_clone_args_from_user(), clone3_args_valid(), and alloc_pid().

Agreed on that part.