Re: [PATCH] Documentation: update about adding syscalls
From: Aleksa Sarai
Date: Thu Oct 03 2019 - 12:53:27 EST
On 2019-10-03, Jonathan Corbet <corbet@xxxxxxx> wrote:
> [Expanding CC a bit; this is the sort of change I'm reluctant to take
> without being sure it reflects what the community thinks.]
>
> On Wed, 2 Oct 2019 17:14:37 +0200
> Christian Brauner <christian.brauner@xxxxxxxxxx> wrote:
>
> > Add additional information on how to ensure that syscalls with structure
> > arguments are extensible and add a section about naming conventions to
> > follow when adding revised versions of already existing syscalls.
> >
> > Co-Developed-by: Aleksa Sarai <cyphar@xxxxxxxxxx>
> > Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx>
> > Signed-off-by: Christian Brauner <christian.brauner@xxxxxxxxxx>
> > ---
> > Documentation/process/adding-syscalls.rst | 82 +++++++++++++++++++----
> > 1 file changed, 70 insertions(+), 12 deletions(-)
> >
> > diff --git a/Documentation/process/adding-syscalls.rst b/Documentation/process/adding-syscalls.rst
> > index 1c3a840d06b9..93e0221fbb9a 100644
> > --- a/Documentation/process/adding-syscalls.rst
> > +++ b/Documentation/process/adding-syscalls.rst
> > @@ -79,7 +79,7 @@ flags, and reject the system call (with ``EINVAL``) if it does::
> > For more sophisticated system calls that involve a larger number of arguments,
> > it's preferred to encapsulate the majority of the arguments into a structure
> > that is passed in by pointer. Such a structure can cope with future extension
> > -by including a size argument in the structure::
> > +by either including a size argument in the structure::
> >
> > struct xyzzy_params {
> > u32 size; /* userspace sets p->size = sizeof(struct xyzzy_params) */
> > @@ -87,20 +87,56 @@ by including a size argument in the structure::
> > u64 param_2;
> > u64 param_3;
> > };
> > + int sys_xyzzy(struct xyzzy_params __user *uarg);
> > + /* in case of -E2BIG, p->size is set to the in-kernel size and thus all
> > + extensions after that offset are unsupported. */
>
> That comment kind of threw me for a loop - this is the first mention of
> E2BIG and readers may not just know what's being talked about. Especially
> since the comment suggests *not* actually returning an error.
I probably could've worded this better -- this comment describes what
userspace sees when they use the API (sched_setattr(2) is an example of
this style of API).
In the case where the kernel doesn't support a requested extension
(usize > ksize, and there are non-zero bytes past ksize) then the kernel
returns -E2BIG *but also* sets p->size to ksize so that userspace knows
what extensions the kernel supports.
Maybe I should've replicated more of the details from the kernel-doc for
copy_struct_from_user().
> > -As long as any subsequently added field, say ``param_4``, is designed so that a
> > -zero value gives the previous behaviour, then this allows both directions of
> > -version mismatch:
> > +or by including a separate argument that specifies the size::
> >
> > - - To cope with a later userspace program calling an older kernel, the kernel
> > - code should check that any memory beyond the size of the structure that it
> > - expects is zero (effectively checking that ``param_4 == 0``).
> > - - To cope with an older userspace program calling a newer kernel, the kernel
> > - code can zero-extend a smaller instance of the structure (effectively
> > - setting ``param_4 = 0``).
> > + struct xyzzy_params {
> > + u32 param_1;
> > + u64 param_2;
> > + u64 param_3;
> > + };
> > + /* userspace sets @usize = sizeof(struct xyzzy_params) */
> > + int sys_xyzzy(struct xyzzy_params __user *uarg, size_t usize);
> > + /* in case of -E2BIG, userspace has to attempt smaller @usize values
> > + to figure out which extensions are unsupported. */
>
> Here too. But what I'm really wondering about now is: you're describing
> different behavior for what are essentially two cases of the same thing.
> Why should the kernel simply accept the smaller size if the size is
> embedded in the structure itself, but return an error and force user space
> to retry if it's a separate argument?
>
> I guess maybe because in the latter case the kernel can't easily return
> the size it's actually using? I think that should be explicit if so.
As above, the -E2BIG only happens if userspace is trying to use an
extension that the kernel doesn't support (usize > ksize, non-zero bytes
after ksize). The main difference between the two API styles is whether
or not userspace gets told what ksize is explicitly in the case of an
-E2BIG.
Maybe it would be less confusing to only mention one of ways of doing
it, but then we have to pick one (and while the newer syscalls [clone3
and openat2] use a separate argument, there are more syscalls which
embed it in the struct).
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
Attachment:
signature.asc
Description: PGP signature