Re: [PATCH 23/45] sched: add do_sched_yield() helper; remove in-kernel call to sched_yield()

From: Dominik Brodowski
Date: Thu Mar 22 2018 - 13:43:13 EST


On Thu, Mar 22, 2018 at 06:29:59PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 22, 2018 at 10:00:37AM +0100, Dominik Brodowski wrote:
> > Using the sched-internal do_sched_yield() helper allows us to get rid of
> > the sched-internal call to the sys_sched_yield() syscall.
> >
> > This patch is part of a series which tries to remove in-kernel calls to
> > syscalls. On this basis, the syscall entry path can be streamlined.
>
> But why !? Either Cc me on more of the series such that the whole makes
> sense, or better yet, write a proper Changelog.

Well, the summary is right there in the changelog: Kernel code simply should
not pretend to be userspace and call a syscall function. For a more
detailled description, see, for instance, Linus' explanation in
http://lkml.kernel.org/r/CA+55aFwo7yA1gm8AUYMEQA8ZNY-9GGF8Oup09jJFvEa4J7C+jA@xxxxxxxxxxxxxx :

| On x86-64, we'd like to just pass the 'struct pt_regs *' pointer, and
| have the sys_xyz() function itself just pick out the arguments it
| needs from there.
|
| That has a few reasons for it:
|
| - we can clear all registers at system call entry, which helps defeat
| some of the "pass seldom used register with user-controlled value that
| survives deep into the callchain" things that people used to leak
| information
|
| - we can streamline the low-level system call code, which needs to
| pass around 'struct pt_regs *' anyway, and the system call only picks
| up the values it actually needs

I can add such a long description to all these patches, but that seems to
be a bit... longwinded.

Thanks,
Dominik