Re: [PATCH 04/14] cxl/mem: Implement polled mode mailbox
From: Dan Williams
Date: Tue Feb 02 2021 - 18:55:04 EST
On Tue, Feb 2, 2021 at 2:57 PM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
>
> On 21-02-01 12:00:18, Dan Williams wrote:
> > On Sat, Jan 30, 2021 at 3:52 PM David Rientjes <rientjes@xxxxxxxxxx> wrote:
> > >
> > > On Fri, 29 Jan 2021, Ben Widawsky wrote:
> > >
> > > > Provide enough functionality to utilize the mailbox of a memory device.
> > > > The mailbox is used to interact with the firmware running on the memory
> > > > device.
> > > >
> > > > The CXL specification defines separate capabilities for the mailbox and
> > > > the memory device. The mailbox interface has a doorbell to indicate
> > > > ready to accept commands and the memory device has a capability register
> > > > that indicates the mailbox interface is ready. The expectation is that
> > > > the doorbell-ready is always later than the memory-device-indication
> > > > that the mailbox is ready.
> > > >
> > > > Create a function to handle sending a command, optionally with a
> > > > payload, to the memory device, polling on a result, and then optionally
> > > > copying out the payload. The algorithm for doing this comes straight out
> > > > of the CXL 2.0 specification.
> > > >
> > > > Primary mailboxes are capable of generating an interrupt when submitting
> > > > a command in the background. That implementation is saved for a later
> > > > time.
> > > >
> > > > Secondary mailboxes aren't implemented at this time.
> > > >
> > > > The flow is proven with one implemented command, "identify". Because the
> > > > class code has already told the driver this is a memory device and the
> > > > identify command is mandatory.
> > > >
> > > > Signed-off-by: Ben Widawsky <ben.widawsky@xxxxxxxxx>
> > > > ---
> > > > drivers/cxl/Kconfig | 14 ++
> > > > drivers/cxl/cxl.h | 39 +++++
> > > > drivers/cxl/mem.c | 342 +++++++++++++++++++++++++++++++++++++++++++-
> > > > 3 files changed, 394 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > > > index 3b66b46af8a0..fe591f74af96 100644
> > > > --- a/drivers/cxl/Kconfig
> > > > +++ b/drivers/cxl/Kconfig
> > > > @@ -32,4 +32,18 @@ config CXL_MEM
> > > > Chapter 2.3 Type 3 CXL Device in the CXL 2.0 specification.
> > > >
> > > > If unsure say 'm'.
> > > > +
> > > > +config CXL_MEM_INSECURE_DEBUG
> > > > + bool "CXL.mem debugging"
> > > > + depends on CXL_MEM
> > > > + help
> > > > + Enable debug of all CXL command payloads.
> > > > +
> > > > + Some CXL devices and controllers support encryption and other
> > > > + security features. The payloads for the commands that enable
> > > > + those features may contain sensitive clear-text security
> > > > + material. Disable debug of those command payloads by default.
> > > > + If you are a kernel developer actively working on CXL
> > > > + security enabling say Y, otherwise say N.
> > >
> > > Not specific to this patch, but the reference to encryption made me
> > > curious about integrity: are all CXL.mem devices compatible with DIMP?
> > > Some? None?
> >
> > The encryption here is "device passphrase" similar to the NVDIMM
> > Security Management described here:
> >
> > https://pmem.io/documents/IntelOptanePMem_DSM_Interface-V2.0.pdf
> >
> > The LIBNVDIMM enabling wrapped this support with the Linux keys
> > interface which among other things enforces wrapping the clear text
> > passphrase with a Linux "trusted/encrypted" key.
> >
> > Additionally, the CXL.io interface optionally supports PCI IDE:
> >
> > https://www.intel.com/content/dam/www/public/us/en/documents/reference-guides/pcie-device-security-enhancements.pdf
> >
> > I'm otherwise not familiar with the DIMP acronym?
> >
> > > > +
> > > > endif
> > > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > > > index a3da7f8050c4..df3d97154b63 100644
> > > > --- a/drivers/cxl/cxl.h
> > > > +++ b/drivers/cxl/cxl.h
> > > > @@ -31,9 +31,36 @@
> > > > #define CXLDEV_MB_CAPS_OFFSET 0x00
> > > > #define CXLDEV_MB_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> > > > #define CXLDEV_MB_CTRL_OFFSET 0x04
> > > > +#define CXLDEV_MB_CTRL_DOORBELL BIT(0)
> > > > #define CXLDEV_MB_CMD_OFFSET 0x08
> > > > +#define CXLDEV_MB_CMD_COMMAND_OPCODE_MASK GENMASK(15, 0)
> > > > +#define CXLDEV_MB_CMD_PAYLOAD_LENGTH_MASK GENMASK(36, 16)
> > > > #define CXLDEV_MB_STATUS_OFFSET 0x10
> > > > +#define CXLDEV_MB_STATUS_RET_CODE_MASK GENMASK(47, 32)
> > > > #define CXLDEV_MB_BG_CMD_STATUS_OFFSET 0x18
> > > > +#define CXLDEV_MB_PAYLOAD_OFFSET 0x20
> > > > +
> > > > +/* Memory Device (CXL 2.0 - 8.2.8.5.1.1) */
> > > > +#define CXLMDEV_STATUS_OFFSET 0x0
> > > > +#define CXLMDEV_DEV_FATAL BIT(0)
> > > > +#define CXLMDEV_FW_HALT BIT(1)
> > > > +#define CXLMDEV_STATUS_MEDIA_STATUS_MASK GENMASK(3, 2)
> > > > +#define CXLMDEV_MS_NOT_READY 0
> > > > +#define CXLMDEV_MS_READY 1
> > > > +#define CXLMDEV_MS_ERROR 2
> > > > +#define CXLMDEV_MS_DISABLED 3
> > > > +#define CXLMDEV_READY(status) \
> > > > + (CXL_GET_FIELD(status, CXLMDEV_STATUS_MEDIA_STATUS) == CXLMDEV_MS_READY)
> > > > +#define CXLMDEV_MBOX_IF_READY BIT(4)
> > > > +#define CXLMDEV_RESET_NEEDED_SHIFT 5
> > > > +#define CXLMDEV_RESET_NEEDED_MASK GENMASK(7, 5)
> > > > +#define CXLMDEV_RESET_NEEDED_NOT 0
> > > > +#define CXLMDEV_RESET_NEEDED_COLD 1
> > > > +#define CXLMDEV_RESET_NEEDED_WARM 2
> > > > +#define CXLMDEV_RESET_NEEDED_HOT 3
> > > > +#define CXLMDEV_RESET_NEEDED_CXL 4
> > > > +#define CXLMDEV_RESET_NEEDED(status) \
> > > > + (CXL_GET_FIELD(status, CXLMDEV_RESET_NEEDED) != CXLMDEV_RESET_NEEDED_NOT)
> > > >
> > > > /**
> > > > * struct cxl_mem - A CXL memory device
> > > > @@ -44,6 +71,16 @@ struct cxl_mem {
> > > > struct pci_dev *pdev;
> > > > void __iomem *regs;
> > > >
> > > > + struct {
> > > > + struct range range;
> > > > + } pmem;
> > > > +
> > > > + struct {
> > > > + struct range range;
> > > > + } ram;
> > > > +
> > > > + char firmware_version[0x10];
> > > > +
> > > > /* Cap 0001h - CXL_CAP_CAP_ID_DEVICE_STATUS */
> > > > struct {
> > > > void __iomem *regs;
> > > > @@ -51,6 +88,7 @@ struct cxl_mem {
> > > >
> > > > /* Cap 0002h - CXL_CAP_CAP_ID_PRIMARY_MAILBOX */
> > > > struct {
> > > > + struct mutex mutex; /* Protects device mailbox and firmware */
> > > > void __iomem *regs;
> > > > size_t payload_size;
> > > > } mbox;
> > > > @@ -89,5 +127,6 @@ struct cxl_mem {
> > > >
> > > > cxl_reg(status);
> > > > cxl_reg(mbox);
> > > > +cxl_reg(mem);
> > > >
> > > > #endif /* __CXL_H__ */
> > > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > > index fa14d51243ee..69ed15bfa5d4 100644
> > > > --- a/drivers/cxl/mem.c
> > > > +++ b/drivers/cxl/mem.c
> > > > @@ -6,6 +6,270 @@
> > > > #include "pci.h"
> > > > #include "cxl.h"
> > > >
> > > > +#define cxl_doorbell_busy(cxlm) \
> > > > + (cxl_read_mbox_reg32(cxlm, CXLDEV_MB_CTRL_OFFSET) & \
> > > > + CXLDEV_MB_CTRL_DOORBELL)
> > > > +
> > > > +#define CXL_MAILBOX_TIMEOUT_US 2000
> > >
> > > This should be _MS?
> > >
> > > > +
> > > > +enum opcode {
> > > > + CXL_MBOX_OP_IDENTIFY = 0x4000,
> > > > + CXL_MBOX_OP_MAX = 0x10000
> > > > +};
> > > > +
> > > > +/**
> > > > + * struct mbox_cmd - A command to be submitted to hardware.
> > > > + * @opcode: (input) The command set and command submitted to hardware.
> > > > + * @payload_in: (input) Pointer to the input payload.
> > > > + * @payload_out: (output) Pointer to the output payload. Must be allocated by
> > > > + * the caller.
> > > > + * @size_in: (input) Number of bytes to load from @payload.
> > > > + * @size_out: (output) Number of bytes loaded into @payload.
> > > > + * @return_code: (output) Error code returned from hardware.
> > > > + *
> > > > + * This is the primary mechanism used to send commands to the hardware.
> > > > + * All the fields except @payload_* correspond exactly to the fields described in
> > > > + * Command Register section of the CXL 2.0 spec (8.2.8.4.5). @payload_in and
> > > > + * @payload_out are written to, and read from the Command Payload Registers
> > > > + * defined in (8.2.8.4.8).
> > > > + */
> > > > +struct mbox_cmd {
> > > > + u16 opcode;
> > > > + void *payload_in;
> > > > + void *payload_out;
> > > > + size_t size_in;
> > > > + size_t size_out;
> > > > + u16 return_code;
> > > > +#define CXL_MBOX_SUCCESS 0
> > > > +};
> > > > +
> > > > +static int cxl_mem_wait_for_doorbell(struct cxl_mem *cxlm)
> > > > +{
> > > > + const int timeout = msecs_to_jiffies(CXL_MAILBOX_TIMEOUT_US);
> > > > + const unsigned long start = jiffies;
> > > > + unsigned long end = start;
> > > > +
> > > > + while (cxl_doorbell_busy(cxlm)) {
> > > > + end = jiffies;
> > > > +
> > > > + if (time_after(end, start + timeout)) {
> > > > + /* Check again in case preempted before timeout test */
> > > > + if (!cxl_doorbell_busy(cxlm))
> > > > + break;
> > > > + return -ETIMEDOUT;
> > > > + }
> > > > + cpu_relax();
> > > > + }
> > > > +
> > > > + dev_dbg(&cxlm->pdev->dev, "Doorbell wait took %dms",
> > > > + jiffies_to_msecs(end) - jiffies_to_msecs(start));
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm,
> > > > + struct mbox_cmd *mbox_cmd)
> > > > +{
> > > > + dev_warn(&cxlm->pdev->dev, "Mailbox command timed out\n");
> > > > + dev_info(&cxlm->pdev->dev,
> > > > + "\topcode: 0x%04x\n"
> > > > + "\tpayload size: %zub\n",
> > > > + mbox_cmd->opcode, mbox_cmd->size_in);
> > > > +
> > > > + if (IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {
> > > > + print_hex_dump_debug("Payload ", DUMP_PREFIX_OFFSET, 16, 1,
> > > > + mbox_cmd->payload_in, mbox_cmd->size_in,
> > > > + true);
> > > > + }
> > > > +
> > > > + /* Here's a good place to figure out if a device reset is needed */
> > >
> > > What are the implications if we don't do a reset, as this implementation
> > > does not? IOW, does a timeout require a device to be recovered through a
> > > reset before it can receive additional commands, or is it safe to simply
> > > drop the command that timed out on the floor and proceed?
> >
> > Not a satisfying answer, but "it depends". It's also complicated by
> > the fact that a reset may need to be coordinated with other devices in
> > the interleave-set as the HDM decoders may bounce.
> >
> > For comparison, to date there have been no problems with the "drop on
> > the floor" policy of LIBNVDIMM command timeouts. At the same time
> > there simply was not a software visible reset mechanism for those
> > devices so this problem never came out. This mailbox isn't a fast
> > path, so the device is likely completely dead if this timeout is ever
> > violated, and the firmware reporting a timeout might as well assume
> > that the OS gives up on the device.
> >
> > I'll let Ben chime in on the rest...
>
> Reset handling is next on the TODO list for the driver. I had two main reasons
> for not even taking a stab at it.
> 1. I have no good way to test it. We are working on adding some test conditions
> to QEMU for it.
> 2. The main difficulty in my mind with reset is you can't pull the memory out
> from under the OS here. While the driver doesn't yet handle persistent memory
> capacities, it may have volatile capacity configured by the BIOS. So the goal
> was, get the bits of the driver in that would at least allow developers,
> hardware vendors, and folks contributing to the spec a way to have basic
> interaction with a CXL type 3 device.
Honestly I think in most cases if the firmware decides to return a
"reset required" status the Linux response will be "lol, no" because
the firmware has no concept of the violence that would impose on the
rest of the system.