Re: [PATCH 23/45] sched: add do_sched_yield() helper; remove in-kernel call to sched_yield()

From: Linus Torvalds
Date: Thu Mar 22 2018 - 13:45:01 EST


On Thu, Mar 22, 2018 at 10:29 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> But why !? Either Cc me on more of the series such that the whole makes
> sense, or better yet, write a proper Changelog.

This is a common issue. We should encourage people to always send at
least the cover-page to everybody who gets cc'd, even if they don't
get the whole series.

Anyway, to repeat: the calling convention for x86-64 system call
wrappers will be to just pass in "struct pt_regs", and the system call
wrapper itself will take the arguments from there.

That means that we won't have random user space contents in registers
that can leak deep down the call chain. The registers are cleared at
system call entry, and only the actual real arguments are reloaded.

(It also makes do_syscall_64() generate better code, natch).

Anyway, that means that you *CANNOT* call "sys_xyz() from kernel code.
Not that you really should have anyway, but there are tons of
historical reasons why we do. But now it fundamentally won't work,
because you'd need to literally do

{ struct pt_regs regs;
regs.rdi = (unsigned long) firstarg;
regs.rsi = (unsigned long) second;
...
sys_xyz(&regs); }

to do it on x86-64.

Anyway, there's a longer discussion about why this is the case
elsewhere, and why we want to do it, but just take it as granted: you
will not be able to call sys_xyz() directly, and that's just a fact.

Making people able to do it would make real system calls (that are a
hell of a lot more important) slower. So it's simply not going to be
allowed.

Linus