[PATCH V4 0/5] mlx5 ConnectX control misc driver

From: Saeed Mahameed
Date: Wed Feb 07 2024 - 02:28:30 EST


From: Saeed Mahameed <saeedm@xxxxxxxxxx>

Recap from V3 discussion:
=========================

LWN has published an article on this series aptly summarizing the debate.
LINK: https://lwn.net/Articles/955001/

We continue to think that mlx5ctl is reasonable and aligned with the
greater kernel community values. People have pointed to the HW RAID
miscdevices as a good analog. The MD developers did not get to block HW
RAID configuration on the basis that it undermines their work on the
software RAID stack. Further, while there is a superficial similarity to
the DRM/accel debate, that was grounded in a real concern that DRM values
on open source would be bypassed. That argument does not hold up here as
this does come with open source userspace and the functionality mlx5ctl
enables on lockdown has always been available to ConnectX users through
the non-lockdown PCI sysfs. netdev has been doing just fine despite the
long standing presence of this tooling and we have continued to work with
Jakub on building common APIs when appropriate. mlx5 already implements
a wide range of the netdev common interfaces, many of which were pushed
forward by our staff - the DPLL configuration netlink being a recent
example.

Version history:
================
V3: https://lore.kernel.org/all/20231121070619.9836-1-saeed@xxxxxxxxxx/#r
V2: https://lore.kernel.org/all/20231119092450.164996-1-saeed@xxxxxxxxxx/#r
V1: https://lore.kernel.org/all/20231018081941.475277-1-saeed@xxxxxxxxxx/#r

V3->V4:
- Document locking scheme for device lifecycle
- Document reserved bits will always be checked for 0 by driver
- Use GFP_KERNEL instead of ACCOUNT for short lived buffers
- Create sysfs link to parent device under the misc device's sysfs
- Remove unnecessary device name from info ioctl output
- Remove reserved and future flags fields from ioctls
- Precise size checking for ioctl user input.

V2->V3:
- Fix bad Sign-off line.
- Fix kernel robot warnings, define a user ptr arg for umem_unreg ioctl
instead of plain integer to simplify compat_ioctl usage.

V1->V2:
- Provide legal statement and sign-off for dual license use
- Fix License clause to use: BSD-3-Clause OR GPL-2.0
- Fix kernel robot warnings
- Use dev_dbg directly instead of umem_dbg() local wrapper
- Implement .compat_ioctl for 32bit compatibility
- Fix mlx5ctl_info ABI structure size and alignment
- Local pointer to correct type instead of in-place cast
- Check unused fields and flags are 0 on ioctl path
- Use correct macro to declare scalar arg ioctl command
#define MLX5CTL_IOCTL_UMEM_UNREG \
_IO(MLX5CTL_IOCTL_MAGIC, 0x3)

mlx5 ConnectX control misc driver:
==================================

The ConnectX HW family supported by the mlx5 drivers uses an architecture
where a FW component executes "mailbox RPCs" issued by the driver to make
changes to the device. This results in a complex debugging environment
where the FW component has information and low level configuration that
needs to be accessed to userspace for debugging purposes.

Historically a userspace program was used that accessed the PCI register
and config space directly through /sys/bus/pci/.../XXX and could operate
these debugging interfaces in parallel with the running driver.
This approach is incompatible with secure boot and kernel lockdown so this
driver provides a secure and restricted interface to that.

Patch breakdown:
================

1) The first patch in the series introduces the main driver file with the
implementation of a new mlx5 auxiliary device driver to run on top
mlx5_core device instances, on probe it creates a new misc device and in
this patch we implement the open and release fops, On open the driver
would allocate a special FW UID (user context ID) restricted to debug
RPCs only, where all user debug rpcs will be executed under this UID,
and on release the UID will be freed.

2) The second patch adds an info ioctl that will show the allocated UID
and the available capability masks of the device and the current UID, and
some other useful device information such as the underlying ConnectX

Example:
$ sudo ./mlx5ctlu mlx5_core.ctl.0
mlx5dev: 0000:00:04.0
UCTX UID: 1
UCTX CAP: 0x3
DEV UCTX CAP: 0x3
USER CAP: 0x1d

3) Third patch will add the capability to execute debug RPCs under the
special UID.

In the mlx5 architecture the FW RPC commands are of the format of
inbox and outbox buffers. The inbox buffer contains the command
rpc layout as described in the ConnectX Programmers Reference Manual
(PRM) document and as defined in linux/include/mlx5/mlx5_ifc.h.

On success the user outbox buffer will be filled with the device's rpc
response.

For example to query device capabilities:
a user fills out an inbox buffer with the inbox layout:
struct mlx5_ifc_query_hca_cap_in_bits
and expects an outbox buffer with the layout:
struct mlx5_ifc_cmd_hca_cap_bits

4) The fourth patch adds the ability to register user memory into the
ConntectX device and create a umem object that points to that memory.

Command rpc outbox buffer is limited in size, which can be very
annoying when trying to pull large traces out of the device.
Many rpcs offer the ability to scatter output traces, contexts
and logs directly into user space buffers in a single shot.

The registered memory will be described by a device UMEM object which
has a unique umem_id, this umem_id can be later used in the rpc inbox
to tell the device where to populate the response output,
e.g HW traces and other debug object queries.

Example usecase, a ConnectX device coredump can be as large as 2MB.
Using inline rpcs will take thousands of rpcs to get the full
coredump which can consume multiple seconds.

With UMEM, it can be done in a single rpc, using 2MB of umem user buffer.

Other usecases with umem:
- dynamic HW and FW trace monitoring
- high frequency diagnostic counters sampling
- batched objects and resource dumps

See links below for information about user space tools that use this
interface:

[1] https://github.com/saeedtx/mlx5ctl

[2] https://github.com/Mellanox/mstflint
see:

d) mstregdump utility
This utility dumps hardware registers from Mellanox hardware
for later analysis by Mellanox.

g) mstconfig
This tool sets or queries non-volatile configurable options
for Mellanox HCAs.

h) mstfwmanager
Mellanox firmware update and query utility which scans the system
for available Mellanox devices (only mst PCI devices) and performs
the necessary firmware updates.

i) mstreg
The mlxreg utility allows users to obtain information regarding
supported access registers, such as their fields

License: BSD-3-Clause OR GPL-2.0
================================
After a review of this thread [3], and a conversation with the LF,
Mellanox and NVIDIA legal continue to approve the use of a Dual GPL &
Permissive License for mlx5 related driver contributions. This makes it
clear to future contributors that this file may be adapted and reused
under BSD-3-Clause terms on other operating systems. Contributions will
be handled in the normal way and the dual license will apply
automatically. If people wish to contribute significantly and opt out of
a dual license they may separate their GPL only contributions in dedicated
files.

Jason has a signing authority for NVIDIA and has gone through our internal
process to get approval.

[3] https://lore.kernel.org/all/20231018081941.475277-3-saeed@xxxxxxxxxx/#r

Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> # for legal
Signed-off-by: Saeed Mahameed <saeedm@xxxxxxxxxx>
Nacked-by: Jakub Kicinski <kuba@xxxxxxxxxx>

Saeed Mahameed (5):
mlx5: Add aux dev for ctl interface
misc: mlx5ctl: Add mlx5ctl misc driver
misc: mlx5ctl: Add info ioctl
misc: mlx5ctl: Add command rpc ioctl
misc: mlx5ctl: Add umem reg/unreg ioctl

.../userspace-api/ioctl/ioctl-number.rst | 1 +
MAINTAINERS | 8 +
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/mlx5ctl/Kconfig | 14 +
drivers/misc/mlx5ctl/Makefile | 5 +
drivers/misc/mlx5ctl/main.c | 597 ++++++++++++++++++
drivers/misc/mlx5ctl/umem.c | 322 ++++++++++
drivers/misc/mlx5ctl/umem.h | 17 +
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 8 +
include/uapi/misc/mlx5ctl.h | 50 ++
11 files changed, 1024 insertions(+)
create mode 100644 drivers/misc/mlx5ctl/Kconfig
create mode 100644 drivers/misc/mlx5ctl/Makefile
create mode 100644 drivers/misc/mlx5ctl/main.c
create mode 100644 drivers/misc/mlx5ctl/umem.c
create mode 100644 drivers/misc/mlx5ctl/umem.h
create mode 100644 include/uapi/misc/mlx5ctl.h

--
2.43.0