Re: [PATCH 08/14] taint: add taint for direct hardware access

From: Dan Williams
Date: Mon Feb 08 2021 - 20:04:38 EST


On Mon, Feb 8, 2021 at 3:36 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
> On Mon, Feb 8, 2021 at 2:09 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> >
> > On Mon, Feb 08, 2021 at 02:00:33PM -0800, Dan Williams wrote:
> > > [ add Jon Corbet as I'd expect him to be Cc'd on anything that
> > > generically touches Documentation/ like this, and add Kees as the last
> > > person who added a taint (tag you're it) ]
> > >
> > > Jon, Kees, are either of you willing to ack this concept?
> > >
> > > Top-posting to add more context for the below:
> > >
> > > This taint is proposed because it has implications for
> > > CONFIG_LOCK_DOWN_KERNEL among other things. These CXL devices
> > > implement memory like DDR would, but unlike DDR there are
> > > administrative / configuration commands that demand kernel
> > > coordination before they can be sent. The posture taken with this
> > > taint is "guilty until proven innocent" for commands that have yet to
> > > be explicitly allowed by the driver. This is different than NVME for
> > > example where an errant vendor-defined command could destroy data on
> > > the device, but there is no wider threat to system integrity. The
> > > taint allows a pressure release valve for any and all commands to be
> > > sent, but flagged with WARN_TAINT_ONCE if the driver has not
> > > explicitly enabled it on an allowed list of known-good / kernel
> > > coordinated commands.
> > >
> > > On Fri, Jan 29, 2021 at 4:25 PM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
> > > >
> > > > For drivers that moderate access to the underlying hardware it is
> > > > sometimes desirable to allow userspace to bypass restrictions. Once
> > > > userspace has done this, the driver can no longer guarantee the sanctity
> > > > of either the OS or the hardware. When in this state, it is helpful for
> > > > kernel developers to be made aware (via this taint flag) of this fact
> > > > for subsequent bug reports.
> > > >
> > > > Example usage:
> > > > - Hardware xyzzy accepts 2 commands, waldo and fred.
> > > > - The xyzzy driver provides an interface for using waldo, but not fred.
> > > > - quux is convinced they really need the fred command.
> > > > - xyzzy driver allows quux to frob hardware to initiate fred.
> > > > - kernel gets tainted.
> > > > - turns out fred command is borked, and scribbles over memory.
> > > > - developers laugh while closing quux's subsequent bug report.
> >
> > But a taint flag only lasts for the current boot. If this is a drive, it
> > could still be compromised after reboot. It sounds like this taint is
> > really only for ephemeral things? "vendor shenanigans" is a pretty giant
> > scope ...
> >
>
> That is true. This is more about preventing an ecosystem / cottage
> industry of tooling built around bypassing the kernel. So the kernel
> complains loudly and hopefully prevents vendor tooling from
> propagating and instead directs that development effort back to the
> native tooling. However for the rare "I know what I'm doing" cases,
> this tainted kernel bypass lets some experimentation and debug happen,
> but the kernel is transparent that when the capability ships in
> production it needs to be a native implementation.
>
> So it's less, "the system integrity is compromised" and more like
> "you're bypassing the development process that ensures sanity for CXL
> implementations that may take down a system if implemented
> incorrectly". For example, NVME reset is a non-invent, CXL reset can
> be like surprise removing DDR DIMM.
>
> Should this be more tightly scoped to CXL? I had hoped to use this in
> other places in LIBNVDIMM, but I'm ok to lose some generality for the
> specific concerns that make CXL devices different than other PCI
> endpoints.

As I type this out it strikes me that plain WARN already does
TAINT_WARN and meets the spirit of what is trying to be achieved.

Appreciate the skeptical eye Kees, we'll drop this one.