Re: [PATCH 24/24] Documentation: document ioctl interfaces better

From: Geert Uytterhoeven
Date: Thu Dec 12 2019 - 03:16:56 EST

Hi Arnd,

On Wed, Dec 11, 2019 at 9:53 PM Arnd Bergmann <arnd@xxxxxxxx> wrote:
> Documentation/process/botching-up-ioctls.rst was orignally
> written as a blog post for DRM driver writers, so it it misses
> some points while going into a lot of detail on others.
> Try to provide a replacement that addresses typical issues
> across a wider range of subsystems, and follows the style of
> the core-api documentation better.
> Signed-off-by: Arnd Bergmann <arnd@xxxxxxxx>

Thanks for your patch!

> --- /dev/null
> +++ b/Documentation/core-api/ioctl.rst
> @@ -0,0 +1,250 @@
> +======================
> +ioctl based interfaces
> +======================
> +
> +:c:func:`ioctl` is the most common way for applications to interface
> +with device drivers. It is flexible and easily extended by adding new
> +commands and can be passed through character devices, block devices as
> +well as sockets and other special file descriptors.
> +
> +However, it is also very easy to get ioctl command definitions wrong,
> +and hard to fix them later without breaking existing applications,
> +so this documentation tries to help developers get it right.
> +
> +Command number definitions
> +==========================
> +
> +The command number, or request number, is the second argument passed to
> +the ioctl system call. While this can be any 32-bit number that uniquely
> +identifies an action for a particular driver, there are a number of
> +conventions around defining them.

Interesting. I never realized the action is 32-bit in the kernel, but
unsigned long in userspace...

> +
> +``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
> +ioctl commands that follow modern conventions: ``_IOC``, ``_IOR``,
> +``_IOW``, and ``_IORW``. These should be used for all new commands,
> +with the correct parameters:
> +

This says _IO....

> + The macro name determines whether the argument is used for passing
> + data into kernel (_IOW), from the kernel (_IOR), both (_IOWR) or is

into the kernel
, or is

> + not a pointer (_IOC). It is possible but not recommended to pass an
> + integer value instead of a pointer with _IOC.

...which is not explained here, but _IOC is?

> +
> +type
> + An 8-bit number, often a character literal, specific to a subsystem
> + or driver, and listed in :doc:`../ioctl/ioctl-number`
> +
> +nr
> + An 8-bit number identifying the specific command, unique for a give
> + value of 'type'
> +
> +size
> + The name of the data type pointed to by the argument, the command
> + number encodes the ``sizeof(size)`` value in a 13-bit or 14-bit integer,
> + leading to a limit of 8191 bytes for the maximum size of the argument.
> + Note: do not pass sizeof(type) type into _IOR/IOW, as that will lead
> + to encoding sizeof(sizeof(type)), i.e. sizeof(size_t).

Looks like "size" could be renamed, to make this more obvious?

> +Timestamps
> +==========
> +
> +Traditionally, timestamps and timeout values are passed as ``struct
> +timespec`` or ``struct timeval``, but these are problematic because of
> +incompatible definitions of these structures in user space after the
> +move to 64-bit time_t.
> +
> +The __kernel_timespec type can be used instead to be embedded in other
> +data structures when separate second/nanosecond values are desired,
> +or passed to user space directly. This is still not ideal though,
> +as the structure matches neither the kernel's timespec64 nor the user
> +space timespec exactly. The get_timespec64() and put_timespec64() helper
> +functions canbe used to ensure that the layout remains compatible with

can be

> +user space and the padding is treated correctly.
> +
> +As it is cheap to convert seconds to nanoseconds, but the opposite
> +requires an expensive 64-bit division, a simple __u64 nanosecond value
> +can be simpler and more efficient.
> +
> +Timeout values and timestamps should ideally use CLOCK_MONOTONIC time,
> +as returned by ``ktime_get_ns()`` or ``ktime_get_ts64()``. Unlike
> +CLOCK_REALTIME, this makes the timestamps immune from jumping backwards
> +or forwards due to leap second adjustments and clock_settime() calls.
> +
> +``ktime_get_real_ns()`` can be used for CLOCK_REALTIME timestamps that
> +may be required for timestamps that need to be persistent across a reboot

Drop "may be required for timestamps that"?

> +or between multiple machines.

> +Structure layout
> +----------------
> +
> +Compatible data structures have the same layout on all architectures,
> +avoiding all problematic members:
> +
> +* ``long`` and ``unsigned long`` are the size of a register, so
> + they can be either 32 bit or 64 bit wide and cannot be used in portable

32-bit or 64-bit (for consistency with the rest of the document)

> + data structures. Fixed-length replacements are ``__s32``, ``__u32``,
> + ``__s64`` and ``__u64``.
> +
> +* Pointers have the same problem, in addition to requiring the
> + use of ``compat_ptr()``. The best workaround is to use ``__u64``
> + in place of pointers, which requires a cast to ``uintptr_t`` in user
> + space, and the use of ``u64_to_user_ptr()`` in the kernel to convert
> + it back into a user pointer.
> +
> +* On the x86-32 (i386) architecture, the alignment of 64-bit variables
> + is only 32 bit, but they are naturally aligned on most other


> + architectures including x86-64. This means a structure like
> +
> + ::
> +
> + struct foo {
> + __u32 a;
> + __u64 b;
> + __u32 c;
> + };
> +
> + has four bytes of padding between a and b on x86-64, plus another four
> + bytes of padding at the end, but no padding on i386, and it needs a
> + compat_ioctl conversion handler to translate between the two formats.
> +
> + To avoid this problem, all structures should have their members
> + naturally aligned, or explicit reserved fields added in place of the
> + implicit padding.
> +
> +* On ARM OABI user space, 16-bit member variables have 32-bit
> + alignment, making them incompatible with modern EABI kernels.
> + Conversely, on the m68k architecture, all struct members have at most
> + 16-bit alignment. These rarely cause problems as neither ARM-OABI nor

"have at most 16-bit alignment" sounds a bit weird to me, as a member
may have a greater alignment.
"struct members are not guaranteed to have an alignment greater than 16-bit"?

> + m68k are supported by any compat mode, but for consistency, it is best
> + to completely avoid 16-bit member variables.
> +
> +
> +* Bitfields and enums generally work as one would expect them to,
> + but some properties of them are implementation-defined, so it is better
> + to avoid them completely in ioctl interfaces.
> +
> +* ``char`` members can be either signed or unsigned, depending on
> + the architecture, so the __u8 and __s8 types should be used for 8-bit
> + integer values, though char arrays are clearer for fixed-length strings.
> +
> +Information leaks
> +=================
> +
> +Uninitialized data must not be copied back to user space, as this can
> +cause an information leak, which can be used to defeat kernel address
> +space layout randomization (KASLR), helping in an attack.
> +
> +As explained for the compat mode, it is best to not avoid any padding in

best to avoid any implicit padding?

> +data structures, but if there is already padding in existing structures,
> +the kernel driver must be careful to zero out the padding using
> +``memset()`` or similar before copying it to user space.
> +
> +Subsystem abstractions
> +======================
> +
> +While some device drivers implement their own ioctl function, most
> +subsystems implement the same command for multiple drivers. Ideally the
> +subsystem has an .ioctl() handler that copies the arguments from and
> +to user space, passing them into subsystem specific callback functions
> +through normal kernel pointers.
> +
> +This helps in various ways:
> +
> +* Applications written for one driver are more likely to work for
> + another one in the same subsystem if there are no subtle differences
> + in the user space ABI.
> +
> +* The complexity of user space access and data structure layout at done

is done

> + in one place, reducing the potential for implementation bugs.
> +
> +* It is more likely to be reviewed by experienced developers
> + that can spot problems in the interface when the ioctl is shared
> + between multiple drivers than when it is only used in a single driver.
> +
> +Alternatives to ioctl
> +=====================
> +
> +There are many cases in which ioctl is not the best solution for a
> +problem. Alternatives include


> +
> +* System calls are a better choice for a system-wide feature that
> + is not tied to a physical device or constrained by the file system
> + permissions of a character device node
> +
> +* netlink is the preferred way of configuring any network related
> + objects through sockets.
> +
> +* debugfs is used for ad-hoc interfaces for debugging functionality
> + that does not need to be exposed as a stable interface to applications.
> +
> +* sysfs is a good way to expose the state of an in-kernel object
> + that is not tied to a file descriptor.
> +
> +* configfs can be used for more complex configuration than sysfs
> +
> +* A custom file system can provide extra flexibility with a simple
> + user interface but add a lot of complexity in the implementation.

adds ... to



Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds