Re: [PATCH] clone: only use lower 32 flag bits

From: Joe Perches
Date: Wed May 06 2020 - 15:06:19 EST


On Tue, 2020-05-05 at 19:44 +0200, Christian Brauner wrote:
> Jan reported an issue where an interaction between sign-extending clone's
> flag argument on ppc64le and the new CLONE_INTO_CGROUP feature causes
> clone() to consistently fail with EBADF.
[]
> Let's fix this by always capping the upper 32 bits for the legacy clone()
> syscall. This ensures that we can't reach clone3() only features by
> accident via legacy clone as with the sign extension case and also that
> legacy clone() works exactly like before, i.e. ignoring any unknown flags.
> This solution risks no regressions and is also pretty clean.
>
> I've chosen u32 and not unsigned int to visually indicate that we're
> capping this to 32 bits.

Perhaps use the lower_32_bits macro?

> diff --git a/kernel/fork.c b/kernel/fork.c
[]
> @@ -2569,12 +2569,21 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
> unsigned long, tls)
> #endif
> {
> + /*
> + * On 64 bit unsigned long can be used by userspace to
> + * pass flag values only useable with clone3(). So cap
> + * the flag argument to the lower 32 bits. This is fine,
> + * since legacy clone() has traditionally ignored unknown
> + * flag values. So don't break userspace workloads that
> + * (on accident or on purpose) rely on this.
> + */
> + u32 flags = (u32)clone_flags;
> struct kernel_clone_args args = {
> - .flags = (clone_flags & ~CSIGNAL),
> + .flags = (flags & ~CSIGNAL),

so:

.flags = lower_32_bits(clone_flags) & ~CSIGNAL;

> .pidfd = parent_tidptr,
> .child_tid = child_tidptr,
> .parent_tid = parent_tidptr,
> - .exit_signal = (clone_flags & CSIGNAL),
> + .exit_signal = (flags & CSIGNAL),

.exit_signal = lower_32_bits(clone_flags) & CSIGNAL;

> .stack = newsp,
> .tls = tls,
> };
>
> base-commit: 0e698dfa282211e414076f9dc7e83c1c288314fd