Re: [REGRESSION] hwmon: (applesmc) avoid overlong udelay()

From: Guenter Roeck
Date: Fri Oct 02 2020 - 00:07:56 EST


On 10/1/20 3:22 PM, Andreas Kemnade wrote:
> On Wed, 30 Sep 2020 22:00:09 +0200
> Arnd Bergmann <arnd@xxxxxxxx> wrote:
>
>> On Wed, Sep 30, 2020 at 6:44 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
>>>
>>> On Wed, Sep 30, 2020 at 10:54:42AM +0200, Andreas Kemnade wrote:
>>>> Hi,
>>>>
>>>> after the $subject patch I get lots of errors like this:
>>>
>>> For reference, this refers to commit fff2d0f701e6 ("hwmon: (applesmc)
>>> avoid overlong udelay()").
>>>
>>>> [ 120.378614] applesmc: send_byte(0x00, 0x0300) fail: 0x40
>>>> [ 120.378621] applesmc: LKSB: write data fail
>>>> [ 120.512782] applesmc: send_byte(0x00, 0x0300) fail: 0x40
>>>> [ 120.512787] applesmc: LKSB: write data fail
>>>>
>>>> CPU sticks at low speed and no fan is turning on.
>>>> Reverting this patch on top of 5.9-rc6 solves this problem.
>>>>
>>>> Some information from dmidecode:
>>>>
>>>> Base Board Information
>>>> Manufacturer: Apple Inc.
>>>> Product Name: Mac-7DF21CB3ED6977E5
>>>> Version: MacBookAir6,2
>>>>
>>>> Handle 0x0020, DMI type 11, 5 bytes OEM Strings String 1: Apple ROM Version. Model: …,
>>>> Handle 0x0020, DMI type 11, 5 bytes
>>>> OEM Strings
>>>> String 1: Apple ROM Version. Model: MBA61. EFI Version: 122.0.0
>>>> String 2: .0.0. Built by: root@saumon. Date: Wed Jun 10 18:
>>>> String 3: 10:36 PDT 2020. Revision: 122 (B&I). ROM Version: F000_B
>>>> String 4: 00. Build Type: Official Build, Release. Compiler: Appl
>>>> String 5: e clang version 3.0 (tags/Apple/clang-211.10.1) (based on LLVM
>>>> String 6: 3.0svn).
>>>>
>>>> Writing to things in /sys/devices/platform/applesmc.768 gives also the
>>>> said errors.
>>>> But writing 1 to fan1_maunal and 5000 to fan1_output turns the fan on
>>>> despite error messages.
>>>>
>>> Not really sure what to do here. I could revert the patch, but then we'd gain
>>> clang compile failures. Arnd, any idea ?
>>
>> It seems that either I made a mistake in the conversion and it sleeps for
>> less time than before, or my assumption was wrong that converting a delay to
>> a sleep is safe here.
>>
>> The error message indicates that the write fails, not the read, so that
>> is what I'd look at first. Right away I can see that the maximum time to
>> retry is only half of what it used to be, as we used to wait for
>> 0x10, 0x20, 0x40, 0x80, ..., 0x20000 microseconds for a total of
>> 0x3fff0 microseconds (262ms), while my patch went with the 131ms
>> total delay based on the comment saying "/* wait up to 128 ms for a
>> status change. */".
>>
> Yes, that is also what I read from the code. I just thought there must
> be something simple, which just needs a short look from another pair of
> eyes.
>
>> Since there is sleeping wait, I see no reason the timeout couldn't
>> be extended a lot, e.g. to a second, as in
>>
>> #define APPLESMC_MAX_WAIT 0x100000
>>
>> If that doesn't work, I'd try using mdelay() in place of
>> usleep_range(), such as
>>
>> mdelay(DIV_ROUND_UP(us, USEC_PER_MSEC)));
>>
>> This adds back a really nasty latency, but it should avoid the
>> compile-time problem.
>>
>> Andreas, can you try those two things? (one at a time,
>> not both)
>
> Ok, I tried. None of them works. I rechecked my work and created real
> git commits out of them and CONFIG_LOCALVERSION_AUTO is also set so
> the usual stupid things are rules out.
> In detail:
> On top of 5.9-rc6 + *reverted* patch:
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index fd99c9df8a00..2a9bd7f2b71b 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -45,7 +45,7 @@
> /* wait up to 128 ms for a status change. */
> #define APPLESMC_MIN_WAIT 0x0010
> #define APPLESMC_RETRY_WAIT 0x0100
> -#define APPLESMC_MAX_WAIT 0x20000
> +#define APPLESMC_MAX_WAIT 0x8000
>
> #define APPLESMC_READ_CMD 0x10
> #define APPLESMC_WRITE_CMD 0x11
>

Oh man, that code is so badlys broken.

send_byte() repeats sending the data if it was not immediately successful.
That is done for both data and commands. Effectively that happens if
the command is not immediately accepted. However, send_argument()
clearly assumes that each data byte is sent exactly once. Sending
it more than once will mess up the key that is supposed to be sent.
The Apple SMC emulation code in qemu confirms that data bytes can not
be written more than once.

Of course, theoretically it may be that the first data byte was not
accepted (after all, the ACK bit is not set), but the ACK bit is
not checked again after udelay(APPLESMC_RETRY_WAIT), so it may
well have been set in the 256 uS between its check and re-writing
the data.

In other words, this entire code only works accidentally to start with.

If you like, you could play around with the code and find out if and
when exactly bit 1 (busy) is set, if and when bit 2 (ack) is set, and
if and when any other bit is set. We could also try to read port 0x31e
(the error port). Maybe the we can figure out what the error actually
is. But then I don't really know what we could do with that information.

Other than that, the only useful idea I have is something crazy like
if (us < 10000)
udelay(us);
else
mdelay(DIV_ROUND_CLOSEST(udelay, 1000));
in the hope that clang doesn't convert that back into a
compile-time constant and udelay().

Overall it seems like the apple protocol may expect to receive data
bytes faster than 1ms apart, because that is the only real difference
between the original code and the new code using mdelay().

Guenter