Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall

From: Russell King - ARM Linux
Date: Tue Jan 20 2015 - 18:05:22 EST


On Tue, Jan 20, 2015 at 10:45:19PM +0000, Russell King - ARM Linux wrote:
> Well, the whole question is this: is restarting a system call like
> usleep() really a separate system call, or is it a kernel implementation
> detail?
>
> If you wanted seccomp to see this, what would be the use case? Why
> would seccomp want to block this syscall? Does it make sense for
> seccomp to block this syscall when it doesn't block something like
> usleep() and then have usleep() fail just because the thread received
> a signal?
>
> I personally regard the whole restart system call thing as a purely
> kernel internal thing which should not be exposed to userland. If
> we decide that it should be exposed to userland, then it becomes part
> of the user ABI, and it /could/ become difficult if we needed to
> change it in future - and I'd rather not get into the "oh shit, we
> can't change it because that would break app X" crap.

Here's a scenario where it could become a problem:

Let's say that we want to use seccomp to secure some code which issues
system calls. We determine that the app uses system calls which don't
result in the restart system call being issued, so we decide to ask
seccomp to block the restart system call. Some of these system calls
that the app was using are restartable system calls.

When these system calls are restarted, what we see via ptrace etc is
that the system call simply gets re-issued as its own system call.

In a future kernel version, we decide that we could really do with one
of those system calls using the restart block feature, so we arrange
for it to set up the restart block, and return -ERESTART_BLOCK. That's
fine for most applications, but this app now breaks.

The side effect of that breakage is that we have to revert that kernel
change - because we've broken userland, and that's simply not allowed.

Now look at the alternative: we don't make the restart syscall visible.
This means that we hide that detail, and we actually reflect the
behaviour that we've had for the other system call restart mechanisms,
and we don't have to fear userspace breakage as a result of switching
from one restart mechanism to another.

I am very much of the opinion that we should be trying to limit the
exposure of inappropriate kernel internal details to userland, because
userland has a habbit of becoming reliant on them, and when it does,
it makes kernel maintanence unnecessarily harder.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/