Re: [PATCH] cxl/memdev: Avoid mailbox functionality on device memory CXL devices

From: Ira Weiny
Date: Tue Aug 01 2023 - 17:04:26 EST


Davidlohr Bueso wrote:
> On Fri, 28 Jul 2023, Dan Williams wrote:
>
> >Ira Weiny wrote:
> >> Using the proposed type-2 cxl-test device[1] the following
> >> splat was observed:
> >>
> >> BUG: kernel NULL pointer dereference, address: 0000000000000278
> >> [...]
> >> RIP: 0010:devm_cxl_add_memdev+0x1de/0x2c0 [cxl_core]
> >
> >It would be useful to decode this to a line number, the rest of this
> >call trace is not adding much.
> >
> >> [...]
> >> Call Trace:
> >> <TASK>
> >> ? __die+0x1f/0x70
> >> ? page_fault_oops+0x149/0x420
> >> ? fixup_exception+0x22/0x310
> >> ? kernelmode_fixup_or_oops+0x84/0x110
> >> ? exc_page_fault+0x6d/0x150
> >> ? asm_exc_page_fault+0x22/0x30
> >> ? devm_cxl_add_memdev+0x1de/0x2c0 [cxl_core]
> >> cxl_mock_mem_probe+0x632/0x870 [cxl_mock_mem]
> >> platform_probe+0x40/0x90
> >> really_probe+0x19e/0x3e0
> >> ? __pfx___driver_attach+0x10/0x10
> >> __driver_probe_device+0x78/0x160
> >> driver_probe_device+0x1f/0x90
> >> __driver_attach+0xce/0x1c0
> >> bus_for_each_dev+0x63/0xa0
> >> bus_add_driver+0x112/0x210
> >> driver_register+0x55/0x100
> >> ? __pfx_cxl_mock_mem_driver_init+0x10/0x10 [cxl_mock_mem]
> >> [...]
> >>
> >> Commit f6b8ab32e3ec made the mailbox functionality optional. However,
> >> some mailbox functionality was merged after that patch. Therefore some
> >> mailbox functionality can be accessed on a device which did not set up
> >> the mailbox.
> >
> >cxl_memdev_security_init() definitely needs to move out of
> >devm_cxl_add_memdev() and after that I do not think @mds NULL checks
> >need to be sprinkled everywhere. In other words something is wrong at a
> >higher level if we get into some of these helper functions without the
> >memory device state.
>
> Right, so we can move it directly into cxl_pci_probe() - just as with other
> mbox based functionality. This leaves me wondering, however, what to do about
> the cxl_memdev_security_shutdown() counterpart. As with the below diff, leaving
> it as is and just adding a mds nil check might still be considering a layering
> violation in that it would be asymmetrical wrt to the init; but this is tightly
> coupled with cxl_memdev_unregister().
>
> Ira does the below fix the crash?

I had to apply it by hand but yea it fixes the immediate crash.

Did you want to submit that as part of other work?

Ira

>
> Thanks,
> Davidlohr
>
> ----8<-------
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 14b547c07f54..4d1bf80c0e54 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -561,7 +561,7 @@ static void cxl_memdev_security_shutdown(struct device *dev)
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
>
> - if (mds->security.poll)
> + if (mds && mds->security.poll)
> cancel_delayed_work_sync(&mds->security.poll_dwork);
> }
>
> @@ -1009,11 +1009,11 @@ static void put_sanitize(void *data)
> sysfs_put(mds->security.sanitize_node);
> }
>
> -static int cxl_memdev_security_init(struct cxl_memdev *cxlmd)
> +int cxl_memdev_security_state_init(struct cxl_memdev_state *mds)
> {
> - struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> - struct device *dev = &cxlmd->dev;
> +
> + struct cxl_dev_state *cxlds = &mds->cxlds;
> + struct device *dev = &cxlds->cxlmd->dev;
> struct kernfs_node *sec;
>
> sec = sysfs_get_dirent(dev->kobj.sd, "security");
> @@ -1029,7 +1029,8 @@ static int cxl_memdev_security_init(struct cxl_memdev *cxlmd)
> }
>
> return devm_add_action_or_reset(cxlds->dev, put_sanitize, mds);
> - }
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_memdev_security_state_init, CXL);
>
> struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
> {
> @@ -1059,10 +1060,6 @@ struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
> if (rc)
> goto err;
>
> - rc = cxl_memdev_security_init(cxlmd);
> - if (rc)
> - goto err;
> -
> rc = devm_add_action_or_reset(cxlds->dev, cxl_memdev_unregister, cxlmd);
> if (rc)
> return ERR_PTR(rc);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index f86afef90c91..441270770519 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -884,6 +884,7 @@ static inline void cxl_mem_active_dec(void)
> #endif
>
> int cxl_mem_sanitize(struct cxl_memdev_state *mds, u16 cmd);
> +int cxl_memdev_security_state_init(struct cxl_memdev_state *mds);
>
> struct cxl_hdm {
> struct cxl_component_regs regs;
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 1cb1494c28fe..5242dbf0044d 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -887,6 +887,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (IS_ERR(cxlmd))
> return PTR_ERR(cxlmd);
>
> + rc = cxl_memdev_security_state_init(mds);
> + if (rc)
> + return rc;
> +
> rc = cxl_memdev_setup_fw_upload(mds);
> if (rc)
> return rc;