RE: RFC: Restricting userspace interfaces for CXL fabric management

From: Dan Williams
Date: Fri Apr 05 2024 - 20:04:49 EST


Jonathan Cameron wrote:
> Hi All,
>
> This is has come up in a number of discussions both on list and in private,
> so I wanted to lay out a potential set of rules when deciding whether or not
> to provide a user space interface for a particular feature of CXL Fabric
> Management. The intent is to drive discussion, not to simply tell people
> a set of rules. I've brought this to the public lists as it's a Linux kernel
> policy discussion, not a standards one.
>
> Whilst I'm writing the RFC this my attempt to summarize a possible
> position rather than necessarily being my personal view.
>
> It's a straw man - shoot at it!
>
> Not everyone in this discussion is familiar with relevant kernel or CXL concepts
> so I've provided more info than I normally would.

Thanks for writing this up Jonathan!

[..]
> 2) Unfiltered userspace use of mailbox for Fabric Management - BMC kernels
> ==========================================================================
>
> (This would just be a kernel option that we'd advise normal server
> distributions not to turn on. Would be enabled by openBMC etc)
>
> This is fine - there is some work to do, but the switch-cci PCI driver
> will hopefully be ready for upstream merge soon. There is no filtering of
> accesses. Think of this as similar to all the damage you can do via
> MCTP from a BMC. Similarly it is likely that much of the complexity
> of the actual commands will be left to user space tooling:
> https://gitlab.com/jic23/cxl-fmapi-tests has some test examples.
>
> Whether Kconfig help text is strong enough to ensure this only gets
> enabled for BMC targeted distros is an open question we can address
> alongside an updated patch set.

It is not clear to me that this material makes sense to house in
drivers/ vs tools/ or even out-of-tree just for maintenance burden
relief of keeping the universes separated. What does the Linux kernel
project get out of carrying this in mainline alongside the inband code?

I do think the mailbox refactoring to support non-CXL use cases is
interesting, but only so far as refactoring is consumed for inband use
cases like RAS API.

> (On to the one that the "debate" is about)
>
> 3) Unfiltered user space use of mailbox for Fabric Management - Distro kernels
> =============================================================================
> (General purpose Linux Server Distro (Redhat, Suse etc))
>
> This is equivalent of RAW command support on CXL Type 3 memory devices.
> You can enable those in a distro kernel build despite the scary config
> help text, but if you use it the kernel is tainted. The result
> of the taint is to add a flag to bug reports and print a big message to say
> that you've used a feature that might result in you shooting yourself
> in the foot.
>
> The taint is there because software is not at first written to deal with
> everything that can happen smoothly (e.g. surprise removal) It's hard
> to survive some of these events, so is never on the initial feature list
> for any bus, so this flag is just to indicate we have entered a world
> where almost all bets are off wrt to stability. We might not know what
> a command does so we can't assess the impact (and no one trusts vendor
> commands to report affects right in the Command Effects Log - which
> in theory tells you if a command can result problems).

That is a secondary reason that the taint is there. Yes, it helps
upstream not waste their time on bug reports from proprietary use cases,
but the effect of that is to make "raw" command mode unattractive for
deploying solutions at scale. It clarifies that this interface is a
debug-tool that enterprise environment need not worry about.

The more salient reason for the taint, speaking only for myself as a
Linux kernel community member not for $employer, is to encourage open
collaboration. Take firmware-update for example that is a standard
command with known side effects that is inaccessible via the ioctl()
path. It is placed behind an ABI that is easier to maintain and reason
about. Everyone has the firmware update tool if they have the 'cat'
command. Distros appreciate the fact that they do not need ship yet
another vendor device-update tool, vendors get free tooling and end
users also appreciate one flow for all devices.

As I alluded here [1], I am not against innovation outside of the
specification, but it needs to be open, and it needs to plausibly become
if not a de jure standard at least a de facto standard.

[1]: https://lore.kernel.org/all/CAPcyv4gDShAYih5iWabKg_eTHhuHm54vEAei8ZkcmHnPp3B0cw@xxxxxxxxxxxxxx/

> A concern was raised about GAE/FAST/LDST tables for CXL Fabrics
> (a r3.1 feature) but, as I understand it, these are intended for a
> host to configure and should not have side effects on other hosts?
> My working assumption is that the kernel driver stack will handle
> these (once we catch up with the current feature backlog!) Currently
> we have no visibility of what the OS driver stack for a fabrics will
> actually look like - the spec is just the starting point for that.
> (patches welcome ;)
>
> The various CXL upstream developers and maintainers may have
> differing views of course, but my current understanding is we want
> to support 1 and 2, but are very resistant to 3!

1, yes, 2, need to see the patches, and agree on 3.

> General Notes
> =============
>
> One side aspect of why we really don't like unfiltered userspace access to any
> of these devices is that people start building non standard hacks in and we
> lose the ecosystem advantages. Forcing a considered discussion + patches
> to let a particular command be supported, drives standardization.

Like I said above, I think this is not a side aspect. It is fundamental
to the viability Linux as a project. This project only works because
organizations with competing goals realize they need some common
infrastructure and that there is little to be gained by competing on the
commons.

> https://lore.kernel.org/linux-cxl/CAPcyv4gDShAYih5iWabKg_eTHhuHm54vEAei8ZkcmHnPp3B0cw@xxxxxxxxxxxxxx/
> provides some history on vendor specific extensions and why in general we
> won't support them upstream.

Oh, you linked my writeup... I will leave the commentary I added here in case
restating it helps.

> To address another question raised in an earlier discussion:
> Putting these Fabric Management interfaces behind guard rails of some type
> (e.g. CONFIG_IM_A_BMC_AND_CAN_MAKE_A_MESS) does not encourage the risk
> of non standard interfaces, because we will be even less likely to accept
> those upstream!
>
> If anyone needs more details on any aspect of this please ask.
> There are a lot of things involved and I've only tried to give a fairly
> minimal illustration to drive the discussion. I may well have missed
> something crucial.

You captured it well, and this is open source so I may have missed
something crucial as well.