Re: [PATCH] ipmi: Add timeout to unconditional wait in __get_device_id()

From: Matt Fleming

Date: Sun Apr 19 2026 - 16:50:50 EST


On Fri, Apr 17, 2026 at 06:53:55PM -0500, Corey Minyard wrote:
>
> The EVENT_MSG_BUFFER_FULL flag only gets cleared when a unsuccessful
> READ_EVENT_MSG_BUFFER command completes. Getting data from the
> BMC has higher priority than sending data to the BMC.
>
> If the BMC continually reports success from READ_EVENT_MSG_BUFFER, then
> that would certainly wedge the driver. But it would have to continually
> report success for that command, which would be strange as its supposed
> to error out when the queue is empty.

That does indeed appear to be what's happening.

The implementation of intel-ipmi-oem's OpenBMC READ_EVENT_MSG_BUFFER
handler does not fail when there is nothing to read,

https://github.com/openbmc/intel-ipmi-oem/blob/master/src/bridgingcommands.cpp#L704

> If it's really something like that, I could also look at adding limits
> for those operations.

That would be great. Me and Fred would be happy to test out any patch.

I still think the original patch I sent is a worthwhile defense.

Our periodic monitoring scripts cause TASK_UNINTERRUPTIBLE tasks to
block behind one another when we hit these kinds of issues in the IPMI
code. Untangling that across thousands of machines can be time
consuming and a more explicit EIO or ETIMEDOUT would help with triage.