Re: [PATCH 0/7] cxl: Consolidate cxlmd->endpoint accessing
From: Dan Williams
Date: Tue Mar 10 2026 - 16:33:56 EST
Li Ming wrote:
> Currently, CXL subsystem implementation has some functions that may
> access CXL memdev's endpoint before the endpoint initialization
> completed or without checking the CXL memdev endpoint validity.
> This patchset fixes three scenarios as above description.
>
> 1. cxl_dpa_to_region() is possible to access an invalid CXL memdev
> endpoint.
> there are two scenarios that can trigger this issue:
> a. memdev poison injection/clearing debugfs interfaces:
> devm_cxl_add_endpoint() is used to register CXL memdev endpoint
> and update cxlmd->endpoint from -ENXIO to the endpoint structure.
> memdev poison injection/clearing debugfs interfaces are registered
> before devm_cxl_add_endpoint() is invoked in cxl_mem_probe().
> There is a small window where user can use the debugfs interfaces
> to access an invalid endpoint.
This is the justification I wanted to see in the changelog of the
patches themselves. That is a reasonable theoretical window.
> b. cxl_event_config() in the end of cxl_pci_probe():
> cxl_event_config() invokes cxl_mem_get_event_record() to get
> remain event logs from CXL device during cxl_pci_probe(). If CXL
> memdev probing failed before that, it is also possible to access
> an invalid endpoint.
Makes sense, please put this in the changelog.
> To fix these two cases, cxl_dpa_to_region() requires callers holding
> CXL memdev lock to access it and check if CXL memdev driver bingding
> status. Holding CXL memdev lock ensures that CXL memdev probing has
> completed, and if CXL memdev driver is bound, it will mean
> cxlmd->endpoint is valid. (PATCH #1-#5)
>
> 2. cxl_reset_done() callback in cxl_pci module.
> cxl_reset_done() callback also accesses cxlmd->endpoint without any
> checking. If CXL memdev probing fails, then cxl_reset_done() is
> called by PCI subsystem, it will access an invalid endpoint. The
> solution is adding a CXL memdev driver binding status inside
> cxl_reset_done(). (PATCH #6)
Makes sense. I jumped into the patches first since I was familiar with
the problem space, but happy to see you did this analysis. Just
cover-letter analysis can typically get lost in teh shuffle.