Re: [PATCH] ipmi:ssif: Exit early when there is a SMBus error

From: Ivan T. Ivanov
Date: Sun Aug 18 2024 - 05:27:39 EST


Hi,

On 2024-08-16 21:04, Corey Minyard wrote:
On Fri, Aug 16, 2024 at 09:54:58AM +0300, Ivan T. Ivanov wrote:
It is pointless to continue module probing when communication
with device is failing. This just fill logs with misleading
messages like this:

So the BMC (or whatever is there) responds to a GET_DEVICE_ID command,

Well, not really. In cases where ssif_detect() returns -ENODEV, i2c core
i2c_detect_address() threat it as "We catch it here as this isn't an error”.
See i2c_detect_address().

but then doesn't properly handle any other valid and mandatory commands?
And this device was added via ACPI or SMBIOS or device tree, almost
certainly.

My comments are:

1) This fix is wrong, because it may mask important things that need to
be reported.

2) Fix the source of the problem. You can't expect software to
compensate for all bad hardware and firmware. I'd guess the firmware
tables are pointing to something that's not a BMC.

I am not saying that hardware or firmware do not have its issues in this
case. But just as it is written now ssif_probe() is fragile. It continue
asking device for features ignoring that there is no valid SMBus communication.


3) It appears the response to the GET_DEVICE_ID command, though a
response is returned, is not valid. The right way to handle this would
be to do more validation in the ssif_detect() function. It doesn't do
any validation of the response data, and that's really what needs to be
done.


do_cmd() in ssif_detect() already do validation. Perhaps,
ssif_probe() should just not return ENODEV in case of error.

Thank you for your comments!

Regards,
Ivan