Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id()
From: Corey Minyard
Date: Wed Apr 15 2026 - 08:17:05 EST
On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote:
> From: Matt Fleming <mfleming@xxxxxxxxxxxxxx>
>
> When the BMC does not respond to a "Get Device ID" command, the
> wait_event() in __get_device_id() blocks forever in TASK_UNINTERRUPTIBLE
> while holding bmc->dyn_mutex. Every subsequent sysfs reader then piles
> up in D state. Replace with wait_event_timeout() to return -EIO after 1
> second.
This is the second report I have of something like this. So something
is up. I'm adding Tony, who reported something like this dealing with
the watchdog.
The lower level driver should never not return an answer, it is supposed
to guarantee that it returns an error if the BMC doesn't respond.
So the bug is not here, the bug is elsewhere. My guess is that there
is some new failure mode where a BMC is not working but it responds well
enough that it sort of works and fools the driver. But that's only a
guess.
I've seen this before in several scenarios, including a system that put
IPMI in the ACPI tree and it sort of worked but there was no BMC
present. I had to disable that particular device.
What hardware is involved here?
Can you give a more detailed example of what's happening in the
low-level hardware? If it's KCS there's a debug flag in the
drivers/char/ipmi/ipmi_kcs_sm.c file that should help.
Thanks,
-corey
>
> Signed-off-by: Matt Fleming <matt@xxxxxxxxxxxxxxxx>
> ---
> drivers/char/ipmi/ipmi_msghandler.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> index c41f51c82edd..efa9588e8210 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -2599,7 +2599,13 @@ static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc)
> if (rv)
> goto out_reset_handler;
>
> - wait_event(intf->waitq, bmc->dyn_id_set != 2);
> + if (!wait_event_timeout(intf->waitq, bmc->dyn_id_set != 2,
> + msecs_to_jiffies(1000))) {
> + dev_warn(intf->si_dev,
> + "Timed out waiting for get bmc device id response\n");
> + rv = -EIO;
> + goto out_reset_handler;
> + }
>
> if (!bmc->dyn_id_set) {
> if (bmc->cc != IPMI_CC_NO_ERROR &&
> --
> 2.43.0
>