Re: [PATCH 08/14] taint: add taint for direct hardware access

From: Ben Widawsky
Date: Mon Feb 08 2021 - 22:47:16 EST


On 21-02-08 17:03:25, Dan Williams wrote:
> On Mon, Feb 8, 2021 at 3:36 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> >
> > On Mon, Feb 8, 2021 at 2:09 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Feb 08, 2021 at 02:00:33PM -0800, Dan Williams wrote:
> > > > [ add Jon Corbet as I'd expect him to be Cc'd on anything that
> > > > generically touches Documentation/ like this, and add Kees as the last
> > > > person who added a taint (tag you're it) ]
> > > >
> > > > Jon, Kees, are either of you willing to ack this concept?
> > > >
> > > > Top-posting to add more context for the below:
> > > >
> > > > This taint is proposed because it has implications for
> > > > CONFIG_LOCK_DOWN_KERNEL among other things. These CXL devices
> > > > implement memory like DDR would, but unlike DDR there are
> > > > administrative / configuration commands that demand kernel
> > > > coordination before they can be sent. The posture taken with this
> > > > taint is "guilty until proven innocent" for commands that have yet to
> > > > be explicitly allowed by the driver. This is different than NVME for
> > > > example where an errant vendor-defined command could destroy data on
> > > > the device, but there is no wider threat to system integrity. The
> > > > taint allows a pressure release valve for any and all commands to be
> > > > sent, but flagged with WARN_TAINT_ONCE if the driver has not
> > > > explicitly enabled it on an allowed list of known-good / kernel
> > > > coordinated commands.
> > > >
> > > > On Fri, Jan 29, 2021 at 4:25 PM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
> > > > >
> > > > > For drivers that moderate access to the underlying hardware it is
> > > > > sometimes desirable to allow userspace to bypass restrictions. Once
> > > > > userspace has done this, the driver can no longer guarantee the sanctity
> > > > > of either the OS or the hardware. When in this state, it is helpful for
> > > > > kernel developers to be made aware (via this taint flag) of this fact
> > > > > for subsequent bug reports.
> > > > >
> > > > > Example usage:
> > > > > - Hardware xyzzy accepts 2 commands, waldo and fred.
> > > > > - The xyzzy driver provides an interface for using waldo, but not fred.
> > > > > - quux is convinced they really need the fred command.
> > > > > - xyzzy driver allows quux to frob hardware to initiate fred.
> > > > > - kernel gets tainted.
> > > > > - turns out fred command is borked, and scribbles over memory.
> > > > > - developers laugh while closing quux's subsequent bug report.
> > >
> > > But a taint flag only lasts for the current boot. If this is a drive, it
> > > could still be compromised after reboot. It sounds like this taint is
> > > really only for ephemeral things? "vendor shenanigans" is a pretty giant
> > > scope ...
> > >
> >
> > That is true. This is more about preventing an ecosystem / cottage
> > industry of tooling built around bypassing the kernel. So the kernel
> > complains loudly and hopefully prevents vendor tooling from
> > propagating and instead directs that development effort back to the
> > native tooling. However for the rare "I know what I'm doing" cases,
> > this tainted kernel bypass lets some experimentation and debug happen,
> > but the kernel is transparent that when the capability ships in
> > production it needs to be a native implementation.
> >
> > So it's less, "the system integrity is compromised" and more like
> > "you're bypassing the development process that ensures sanity for CXL
> > implementations that may take down a system if implemented
> > incorrectly". For example, NVME reset is a non-invent, CXL reset can
> > be like surprise removing DDR DIMM.
> >
> > Should this be more tightly scoped to CXL? I had hoped to use this in
> > other places in LIBNVDIMM, but I'm ok to lose some generality for the
> > specific concerns that make CXL devices different than other PCI
> > endpoints.
>
> As I type this out it strikes me that plain WARN already does
> TAINT_WARN and meets the spirit of what is trying to be achieved.
>
> Appreciate the skeptical eye Kees, we'll drop this one.

So I think this is a good compromise for now. However, the point of this taint
was that it is specifically called out what tainted the kernel. It'd be great to
know when we have a bug report it was this specifically that was the issue.
Rambling further I realize now that taint doesn't tell you which module tainted,
which would be great here. That's actually what I'd like.

End ramble.