Re: [PATCH] hwmon: (pmbus) Clear pmbus fault/warning bits before read

From: Vikash Chandola
Date: Tue Feb 22 2022 - 08:21:15 EST

Next message: Andrew Melnichenko: "Re: [RFC PATCH 2/5] driver/net/tun: Added features for USO."
Previous message: kernel test robot: "[ammarfaizi2-block:google/android/kernel/common/android12-5.4 4044/9999] drivers/dma-buf/heaps/page_pool.c:236:17: sparse: sparse: symbol 'pool_shrinker' was not declared. Should it be static?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2/11/2022 1:27 AM, Guenter Roeck wrote:

On 2/10/22 11:55, Guenter Roeck wrote:

On 2/10/22 10:29, Vikash Chandola wrote:

On 2/10/2022 10:55 PM, Guenter Roeck wrote:

On 2/10/22 07:57, Vikash Chandola wrote:

On 2/10/2022 8:14 PM, Guenter Roeck wrote:

On 2/10/22 04:41, Vikash Chandola wrote:

pmbus fault and warning bits are not cleared by itself once fault/warning
condition is not valid anymore. As per pmbus datasheet faults must be
cleared by user.
Modify hwmon behavior to clear latched status bytes if any bit in status
register is high prior to returning fresh data to userspace. If
fault/warning conditions are still applicable fault/warning bits will be
set and we will get updated data in second read.

Hwmon behavior is changed here. Now sysfs reads will reflect latest
values from pmbus slave, not latched values.
In case a transient warning/fault has happened in the past, it will no
longer be reported to userspace.

NACK.

Reporting that information is exactly the point of the current code.
We _do_ want to report at least once that a problem occurred in the past,
and only clear the warning flag(s) afterwards.

Guenter

But as per current implementation we will continue to report fault even after fault condition is cleared. I could not find sysfs entry or any other means by which user/kernel can clear the faults/warnings after it is reported. Please point if I am missing anything.

Normally a chip should clear the status after the fault condition cleared
and the register was read. At least that was my experience so far.
What chip (or chips) don't do that ?

Guenter

I see that pmbus spec says that once fault/warning bits are set
they won't be cleared by itself. Section 10.2.3 of
"PMBus Specification Rev. 1.3.1 Part II 2015-03-13" from
https://pmbus.org/specification-archives/  ; says

"
Almost all of the warning or fault bits set in the status registers remain set, even if the fault or warning condition is removed
or corrected, until one of the following occur:
* The bit is individually cleared (Section 10.2.4),
* The device receives a CLEAR_FAULTS command (Section 15.1),
* A RESET signal (if one exists) is asserted,
* The output is commanded through the CONTROL pin, the OPERATION command, or the combined action of the CONTROL pin and OPERATION
command, to turn off and then to turn back on, or
* Bias power is removed from the PMBus device...
"

From this I concluded that slave(PSU) following pmbus spec will not
clear the fault/warning in status registers.
I don't have exact chip details handy but I can get it by tomorrow.
However this looks to be issue not related to specific chip ?

Good point. Interesting that I have not seen this before.

If this is a problem, how should I approach this ? Shall I create a new
sysfs entry that user space application can invoke and clear all faults
or provide sysfs entry to clear individual fault/warning bits or
something else ?

No. What we need to do is to add code to pmbus_get_boolean() and selectively
write the reported status bit back into the status register. Something like
     ...
     regval = status & mask;
     if (regval) {
         ret = pmbus_write_status(client, page, reg, regval);
                       ^^^^^^^^^^^^^^^^^^ new function
         if (ret)
             goto unlock;
     }

and that will need to be tested on a variety of real hardware.

Hi Guenter,
I see that there is already a method pmbus_write_byte_data that does
same thing. I will post patch using that method to update one bit.

Plus, of course, we need to confirm that there is at least one chip which
does follow the specification and does not auto-clear alarms.

We observed above behavior SOLUM G36234-015 PSU.
Apologies for delay. It took some time to get information from our PSU
experts that PSU firmware itself is not broken here.

Guenter

--
Vikash

Next message: Andrew Melnichenko: "Re: [RFC PATCH 2/5] driver/net/tun: Added features for USO."
Previous message: kernel test robot: "[ammarfaizi2-block:google/android/kernel/common/android12-5.4 4044/9999] drivers/dma-buf/heaps/page_pool.c:236:17: sparse: sparse: symbol 'pool_shrinker' was not declared. Should it be static?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]