RE: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held

From: Dan Williams
Date: Sat Feb 17 2024 - 15:07:24 EST


Ira Weiny wrote:
> Smatch caught that cxl_cper_post_event() is called with a spinlock held
> or preemption disabled.[1] The callback takes the device lock to
> perform address translation and therefore might sleep. The record data
> is released back to BIOS in ghes_clear_estatus() which requires it to be
> copied for use in the workqueue.
>
> Copy the record to a lockless list and schedule a work item to process
> the record outside of atomic context.
>
> [1] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/
>
> Reported-by: Dan Carpenter <dan.carpenter@xxxxxxxxxx>
> Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> ---
> Changes in v2:
> - djbw: device_lock() sleeps so we need to call the callback in process context
> - iweiny: create work queue to handle processing the callback
> - Link to v1: https://lore.kernel.org/r/20240202-cxl-cper-smatch-v1-1-7a4103c7f5a0@xxxxxxxxx
> ---
> drivers/acpi/apei/ghes.c | 44 +++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 41 insertions(+), 3 deletions(-)
>
[..]
> +static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);
> +
> static void cxl_cper_post_event(enum cxl_event_type event_type,
> struct cxl_cper_event_rec *rec)
> {
> + struct cxl_cper_work_item *wi;
> +
> if (rec->hdr.length <= sizeof(rec->hdr) ||
> rec->hdr.length > sizeof(*rec)) {
> pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
> @@ -721,9 +752,16 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
> return;
> }
>
> - guard(rwsem_read)(&cxl_cper_rw_sem);
> - if (cper_callback)
> - cper_callback(event_type, rec);

Given a work function can be set atomically there is no need to create /
manage a registration lock. Set a 'struct work' instance to a CXL
provided routine on cxl_pci module load and restore it to a nop function
+ cancel_work_sync() on cxl_pci module exit.

> + wi = kmalloc(sizeof(*wi), GFP_ATOMIC);

The system is already under distress trying to report an error it should
not dip into emergency memory reserves to report errors. Use a kfifo()
similar to how memory_failure_queue() avoids memory allocation in the
error reporting path.