Re: [GIT PULL] thread fixes v5.7-rc5
From: Christian Brauner
Date: Thu May 14 2020 - 14:55:26 EST
On Thu, May 14, 2020 at 11:35:29AM -0700, Linus Torvalds wrote:
> On Thu, May 14, 2020 at 11:22 AM Christian Brauner
> <christian.brauner@xxxxxxxxxx> wrote:
> >
> > Seemed weird to me to change something that's been exposed to userspace for that long.
>
> Well, the internal declarations aren't actually "exposed" to user
> space - it's not like it's the declaration of the system call, that's
> separate.
>
> And we have done that before: we have had a lot of history of using
> "unsigned long" to basically mean "register", and then ended up
> cleaning up types afterwards.
So this has been on my mind for a bit and the clone() bug here brought
this up again. I think it would be good if we could have a consensus
that all new system calls with flag arguments should default to
unsigned int as long as the flag argument is passed in a register; maybe
we could even change most legacy syscalls to unsigned int if safe. It's
not very transparent to userspace when looking at kernel sources why
system calls use unsigned long, int, or unsigned int and I doubt there's
much reason to it anyway apart from historical. But maybe I'm wrong;
that's not unusual. Or maybe it's not worth it. But I've been mulling
putting that into the extensible syscall design patch Aleksa and
I wrote and sent out a while ago:
https://lore.kernel.org/linux-doc/20191002151437.5367-1-christian.brauner@xxxxxxxxxx/
right after copy_struct_from_user() landed. Maybe it's worth resending.
>
> In fact, if you look at the macros that do SYSCALL_DEFINE() (hint -
> don't actually do it, you'll go mad) you'll see that magical
> __SC_LONG() thing, which actually declares _all_ arguments as either
> "unsigned long" or "unsigned long long".
>
> That's the "native" representation of the low-level system call (it's
> also marked "asmlinkage" etc).
Right.
>
> We then end up casting them to the internal representation with that
> __SC_CAST() macro.
>
> So the actual types that we get are intentionally "cleaned up"
> versions of the raw registers that were passed in.
Yeah, of course.
>
> But you really don't want to understand the __SYSCALL_DEFINEx() macro.
> It's clever, but it really is the Cthulhu of macros. Just looking at
> it might drive you mad.
That's very Wizard of Oz: "Pay no attention to the macro behind the
syscall declaration"; which means now I really _want_ to look at it. :)
Christian