Re: [PATCH v2 27/27] Documentation: document ioctl interfaces better
From: Arnd Bergmann
Date: Thu Dec 19 2019 - 15:05:35 EST
On Wed, Dec 18, 2019 at 11:45 PM Ben Hutchings
<ben.hutchings@xxxxxxxxxxxxxxx> wrote:
> On Tue, 2019-12-17 at 23:17 +0100, Arnd Bergmann wrote:
> > --- /dev/null
> > +++ b/Documentation/core-api/ioctl.rst
> > +``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
> > +ioctl commands that follow modern conventions: ``_IO``, ``_IOR``,
> > +``_IOW``, and ``_IORW``. These should be used for all new commands,
>
> Typo: "_IORW" should be "_IOWR".
Fixed now
> > +with the correct parameters:
> > +
> > +_IO/_IOR/_IOW/_IOWR
> > + The macro name determines whether the argument is used for passing
> > + data into kernel (_IOW), from the kernel (_IOR), both (_IOWR) or is
> > + not a pointer (_IO). It is possible but not recommended to pass an
> > + integer value instead of a pointer with _IO.
>
> I feel the explanation of _IO here could be confusing. I think what
> you meant to say was that it is possible, but not recommended, to pass
> integers directly (arg is integer) rather than indirectly (arg is
> pointer to integer). I suggest the alternate wording:
>
> The macro name specifies how the argument will be used. It may be a
> pointer to data to be passed into the kernel (_IOW), out of the kernel
> (_IOR), or both (_IOWR). The argument may also be an integer value
> instead of a pointer (_IO), but this is not recommended.
That's probably better than my version, but I find that misleading as well:
it sounds like _IO() is not recommended, but having no argument with
_IO() is actually fine. This is what I have now:
The macro name specifies how the argument will be used. It may be a
pointer to data to be passed into the kernel (_IOW), out of the kernel
(_IOR), or both (_IOWR). _IO can indicate either commands with no
argument or those passing an integer value instead of a pointer.
It is recommended to only use _IO for commands without arguments,
and use pointers for passing data.
> > +data_type
> > + The name of the data type pointed to by the argument, the command number
> > + encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer,
> > + leading to a limit of 8191 bytes for the maximum size of the argument.
> > + Note: do not pass sizeof(data_type) type into _IOR/IOW, as that will
> > + lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t).
>
> You left out _IOWR here. It might also be worth mentioning that _IO
> doesn't have this parameter.
Changed now.
> [...]
> > +Return code
> > +===========
> > +
> > +ioctl commands can return negative error codes as documented in errno(3),
> > +these get turned into errno values in user space.
>
> Use a semi-colon instead of a comma, or change "these" to "which".
done
> > On success, the return
> > +code should be zero. It is also possible but not recommended to return
> > +a positive 'long' value.
> > +
> > +When the ioctl callback is called with an unknown command number, the
> > +handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in
> > +-ENOTTY being returned from the system call. Some subsystems return
> > +-ENOSYS or -EINVAL here for historic reasons, but this is wrong.
> > +
> > +Prior to Linux-5.5, compat_ioctl handlers were required to return
>
> Space instead of hyphen.
done
> > +-ENOIOCTLCMD in order to use the fallback conversion into native
> > +commands. As all subsystems are now responsible for handling compat
> > +mode themselves, this is no longer needed, but it may be important to
> > +consider when backporting bug fixes to older kernels.
> > +
> > +Timestamps
> > +==========
> > +
> > +Traditionally, timestamps and timeout values are passed as ``struct
> > +timespec`` or ``struct timeval``, but these are problematic because of
> > +incompatible definitions of these structures in user space after the
> > +move to 64-bit time_t.
> > +
> > +The __kernel_timespec type can be used instead to be embedded in other
>
> It's not a typedef, so ``struct __kernel_timespec``.
done
> [...]
> > +32-bit compat mode
> > +==================
> > +
> > +In order to support 32-bit user space running on a 64-bit machine, each
> > +subsystem or driver that implements an ioctl callback handler must also
> > +implement the corresponding compat_ioctl handler.
> > +
> > +As long as all the rules for data structures are followed, this is as
> > +easy as setting the .compat_ioctl pointer to a helper function such as
> > +compat_ptr_ioctl() or blkdev_compat_ptr_ioctl().
> > +
> > +compat_ptr()
> > +------------
> > +
> > +On the s/390 architecture, 31-bit user space has ambiguous representations
>
> IBM never used the name "S/390" for the 64-bit mainframe architecture,
> but they have rebranded it several times. Rather than trying to follow
> what it's called this year, maybe just write "s390" to match what we
> usually call it?
ok, done
> > +
> > + has four bytes of padding between a and b on x86-64, plus another four
> > + bytes of padding at the end, but no padding on i386, and it needs a
> > + compat_ioctl conversion handler to translate between the two formats.
> > +
> > + To avoid this problem, all structures should have their members
> > + naturally aligned, or explicit reserved fields added in place of the
> > + implicit padding.
>
> This should explain how to check that - presumably by running pahole on
> some sensible architecture.
Ok, added "The ``pahole`` tool can be used for checking the alignment.".
> > +* On ARM OABI user space, 16-bit member variables have 32-bit
> > + alignment, making them incompatible with modern EABI kernels.
>
> I thought that OABI required structures as a whole to have alignment of
> 4, not individual members? Which obviously does affect small
> structures as members of other structures.
You are right, I clearly misunderstood that. Changed the paragraph now to
* On ARM OABI user space, structures are padded to multiples of 32-bit,
making some structs incompatible with modern EABI kernels if they
do not end on a 32-bit boundary.
* On the m68k architecture, struct members are not guaranteed to have an
alignment greater than 16-bit, which is a problem when relying on
implicit padding.
> [...]
> > +Information leaks
> > +=================
> > +
> > +Uninitialized data must not be copied back to user space, as this can
> > +cause an information leak, which can be used to defeat kernel address
> > +space layout randomization (KASLR), helping in an attack.
> > +
> > +As explained for the compat mode, it is best to not avoid any implicit
>
> Delete "not".
Done.
> +padding in data structures, but if there is already padding in existing
> > +structures, the kernel driver must be careful to zero out the padding
> > +using memset() or similar before copying it to user space.
>
> This sentence is rather too long. Also it can be read as suggesting
> that one should somehow identify and memset() the padding just before
> copying to user-space. I suggest an alternate wording:
>
> For this reason (and for compat support) it is best to avoid any
> implicit padding in data structures. Where there is implicit padding
> in an existing structure, kernel drivers must be careful to fully
> initialize an instance of the structure before copying it to user
> space. This is usually done by calling memset() before assigning to
> individual members.
Sounds good, I've taken that paragraph now.
> [...]
> > +Alternatives to ioctl
> > +=====================
> [...]
> > +* A custom file system can provide extra flexibility with a simple
> > + user interface but add a lot of complexity to the implementation.
>
> Typo: "add" should be "adds".
Fixed
Thanks for all the good suggestions!
Arnd