Re: [PATCH v2 3/5] mm: Make PR_MDWE_REFUSE_EXEC_GAIN an unsigned long

From: Alexey Izbyshev
Date: Tue May 23 2023 - 10:46:59 EST


On 2023-05-23 17:09, Catalin Marinas wrote:
On Tue, May 23, 2023 at 04:25:45PM +0300, Alexey Izbyshev wrote:
On 2023-05-23 16:07, Catalin Marinas wrote:
> On Tue, May 23, 2023 at 11:12:37AM +0200, David Hildenbrand wrote:
> > Also, how is passing "0"s to e.g., PR_GET_THP_DISABLE reliable? We
> > need arg2
> > -> arg5 to be 0. But wouldn't the following also just pass a 0 "int" ?
> >
> > prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0)
> >
> > I'm easily confused by such (va_args) things, so sorry for the dummy
> > questions.
>
> Isn't the prctl() prototype in the user headers defined with the first
> argument as int while the rest as unsigned long? At least from the man
> page:
>
> int prctl(int option, unsigned long arg2, unsigned long arg3,
> unsigned long arg4, unsigned long arg5);
>
> So there are no va_args tricks (which confuse me as well).
>
I have explicitly mentioned the problem with man pages in my response to
David[1]. Quoting myself:

> This stuff *is* confusing, and note that Linux man pages don't even tell
that prctl() is actually declared as a variadic function (and for
ptrace() this is mentioned only in the notes, but not in its signature).

Ah, thanks for the clarification (I somehow missed your reply).

The reality:

* glibc: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/sys/prctl.h;h=821aeefc1339b35210e8918ecfe9833ed2792626;hb=glibc-2.37#l42

* musl:
https://git.musl-libc.org/cgit/musl/tree/include/sys/prctl.h?h=v1.2.4#n180

Though there is a test in the kernel that does define its own prototype,
avoiding the issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/sched/cs_prctl_test.c?h=v6.3#n77

At least for glibc, it seems that there is a conversion to unsigned
long:

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/prctl.c#l28

unsigned long int arg2 = va_arg (arg, unsigned long int);

(does va_arg expand to an actual cast?)

No, this not a conversion or a cast in the sense that I think you mean it. What happens in the situation discussed in this thread is the following (assuming the argument is passed via a register, which is typical for initial variadic arguments on 64-bit targets):

* User calls prctl(op, 0) on a 64-bit target.
* The second argument is an int.
* The compiler generates code to pass an int (32 bits) via a 64-bit register. The compiler is NOT required to clear the upper 32 bits of the register, so they might contain arbitrary junk in a general case.
* The prctl() implementation calls va_arg(arg, unsigned long) (as in your quote).
* The compiler extracts the full 64-bit value of the same register (which in our case might contain junk in the upper 32 bits).
* This extracted 64-bit value is then passed to the system call.

So...

If the libc passes a 32-bit to a kernel ABI that expects 64-bit, I think
it's a user-space bug and not a kernel ABI issue.

... the problem happens not at the user/kernel boundary, but in prctl() call/implementation in user space. But yes, it's still a user-space bug and not a kernel ABI issue. The David's question, as I understand it, was whether we want to keep such buggy code that happens to pass junk failing with EINVAL in future kernels or not. If we do want to keep it failing, we can never assign any meaning to the upper 32 bits of the second prctl() argument for PR_SET_MDWE op.

Thanks,
Alexey