Re: [PATCH] hwmon: pmbus: Make reg check and clear faults functions return errors

From: Guenter Roeck
Date: Thu Sep 07 2017 - 21:26:33 EST


On 09/07/2017 06:02 PM, Andrew Jeffery wrote:
On Thu, 2017-09-07 at 17:27 -0700, Guenter Roeck wrote:
On 09/07/2017 08:22 AM, Andrew Jeffery wrote:
On Thu, 2017-09-07 at 06:40 -0700, Guenter Roeck wrote:
On 09/06/2017 04:32 PM, Andrew Jeffery wrote:

Guess I need to dig up my eval board and see if I can reproduce the problem.
Seems you are saying that the problem is always seen when issuing a sequence
of "clear faults" commands on multiple pages ?

Yeah. We're also seeing bad behaviour under other command sequences as well,
which lead to this hack of a work-around patch[1].

I'd be very interested in the results of testing against the eval board. I
don't have access to one and it seems Maxim have discontinued them.


Do you have a somewhat reliable means to reproduce the problem ?

It seems we hit a bunch of problems by just continually
binding/unbinding the driver, if you don't apply that hacky oneshot
retry patch. We can hit problems (in our design?) with something like:

# cd /sys/bus/i2c/drivers/max31785; \
echo $addr > unbind; \
while echo $addr > bind; \
do echo $addr > unbind; echo -n .; done;

It should hit issues covered by this patch, as the register checks are
used in the operations used by probe.


Hmm ... I didn't use your driver but my prototype driver which also supports
temperature and voltage attributes, so if anything it should create more
stress on the chip.

I did add the temp and voltage attributes...

Any chance you can give mine a try? I don't know what I would have done
to invoke this kind of behaviour, so it would be useful to know whether
or not it happens with one driver but not the other.


Will do.

No error so far, after running the script for a couple
of minutes. How long does it take for errors to appear, and how do I see
that there is an error ?

I'm seeing failures after anything from a handful of bind/unbinds, to
hundreds of bind/unbinds. It seems to vary.

Does the driver fail to instantiate ?

Typically probe fails so the loop exits. It usually gets -EIO and the
shell spits out "No such device".

Thanks for testing, it's a useful data point for us hunting down the
source of our problems.

I aborted the test after ~2,500 loops without error.

Guenter