Re: [PATCH v3] applesmc: Re-work SMC comms

From: Brad Campbell
Date: Tue Nov 10 2020 - 00:42:03 EST


On 10/11/20 3:55 pm, Guenter Roeck wrote:
> On Tue, Nov 10, 2020 at 01:04:04PM +1100, Brad Campbell wrote:
>> On 9/11/20 3:06 am, Guenter Roeck wrote:
>>> On 11/8/20 2:14 AM, Henrik Rydberg wrote:
>>>> On Sun, Nov 08, 2020 at 09:35:28AM +0100, Henrik Rydberg wrote:
>>>>> Hi Brad,
>>>>>
>>>>> On 2020-11-08 02:00, Brad Campbell wrote:
>>>>>> G'day Henrik,
>>>>>>
>>>>>> I noticed you'd also loosened up the requirement for SMC_STATUS_BUSY in read_smc(). I assume
>>>>>> that causes problems on the early Macbook. This is revised on the one sent earlier.
>>>>>> If you could test this on your Air1,1 it'd be appreciated.
>>>>>
>>>>> No, I managed to screw up the patch; you can see that I carefully added the
>>>>> same treatment for the read argument, being unsure if the BUSY state would
>>>>> remain during the AVAILABLE data phase. I can check that again, but
>>>>> unfortunately the patch in this email shows the same problem.
>>>>>
>>>>> I think it may be worthwhile to rethink the behavior of wait_status() here.
>>>>> If one machine shows no change after a certain status bit change, then
>>>>> perhaps the others share that behavior, and we are waiting in vain. Just
>>>>> imagine how many years of cpu that is, combined. ;-)
>>>>
>>>> Here is a modification along that line.
>>>>
>>>
>>> Please resend this patch as stand-alone v4 patch. If sent like it was sent here,
>>> it doesn't make it into patchwork, and is thus not only difficult to apply but
>>> may get lost, and it is all but impossible to find and apply all tags.
>>> Also, prior to Henrik's Signed=off-by: there should be a one-line explanation
>>> of the changes made.
>>>
>>> Thanks,
>>> Guenter
>>>
>>>> Compared to your latest version, this one has wait_status() return the
>>>> actual status on success. Instead of waiting for BUSY, it waits for
>>>> the other status bits, and checks BUSY afterwards. So as not to wait
>>>> unneccesarily, the udelay() is placed together with the single
>>>> outb(). The return value of send_byte_data() is augmented with
>>>> -EAGAIN, which is then used in send_command() to create the resend
>>>> loop.
>>>>
>>>> I reach 41 reads per second on the MBA1,1 with this version, which is
>>>> getting close to the performance prior to the problems.
>>>>
>>
>> Can I get an opinion on this wait statement please?
>>
>> The apple driver uses a loop with a million (1,000,000) retries spaced with a 10uS delay.
>>
>> In my testing on 2 machines, we don't busy wait more than about 2 loops.
>> Replacing a small udelay with the usleep_range kills performance.
>> With the following (do 10 fast checks before we start sleeping) I nearly triple the performance
>> of the driver on my laptop, and double it on my iMac. This is on an otherwise unmodified version of
>> Henriks v4 submission.
>>
>> Yes, given the timeouts I know it's a ridiculous loop condition.
>>
>> static int wait_status(u8 val, u8 mask)
>> {
>> unsigned long end = jiffies + (APPLESMC_MAX_WAIT * HZ) / USEC_PER_SEC;
>> u8 status;
>> int i;
>>
>> for (i=1; i < 1000000; i++) {
>
> The minimum wait time is 10 us, or 16uS after the first 10
> attempts. 1000000 * 10 = 10 seconds. I mean, it would make
> some sense to limit the loop to APPLESMC_MAX_WAIT /
> APPLESMC_MIN_WAIT iterations, but why 1,000,000 ?

I had to pick a big number and it was easy to punch in 6 zeros as that is what is in
the OSX driver. In this instance it's more a proof of concept rather than sane example
because it'll either abort on timeout or return the correct status anyway.
Could also have been a while (true) {} but then I'd need to manually increment i.

>> status = inb(APPLESMC_CMD_PORT);
>> if ((status & mask) == val)
>> return status;
>> /* timeout: give up */
>> if (time_after(jiffies, end))
>> break;
>> if (i < 10)
>> udelay(10);
>> else
>> usleep_range(APPLESMC_MIN_WAIT, APPLESMC_MIN_WAIT * 16);
>
> The original code had the exponential wait time increase.
> I don't really see the point of changing that. I'd suggest
> to keep the exponential increase but change the code to
> something like
> if (us < APPLESMC_MIN_WAIT * 4)
> udelay(us)
> else
> usleep_range(us, us * 16);

The main reason I dropped the exponential was best case on this laptop the modified code with exponential
wait as described above increase increases the performance from ~40 -> 62 reads/sec, whereas the version
I posted with the first 10 loops at 10uS goes from ~40 -> 100 reads/sec.

About 95% of waits never get past a few of iterations of that loop (so ~30-60uS). With a
modified exponential starting at 8uS a 30uS requirement ends up at 56uS. If it needed 60us with
the original we end up waiting 120uS.

If it needs longer than 120uS it's rare enough that a bigger sleep isn't going to cause much
of a performance hit.

Getting completely pathological and busy waiting with a fixed 10uS delay like the OSX driver
does give about 125 reads/sec but I was trying to find a balance and 10 loops seemed to hit that mark.

> Effectively that means the first wait would be 16 uS,
> followed by 32 uS, followed by increasingly larger sleep
> times. I don't know the relevance of APPLESMC_MIN_WAIT
> being set to 16, but if you'd want to start with smaller
> wait times you could reduce it to 8. If you are concerned
> about excessively large sleep times you could reduce
> the span from us..us*16 to, say, us..us*4 or us..us*2.

I was tossing up here between the overhead required to manage a tighter usleep_range
vs some extra udelays.

It's not exactly a performance sensitive driver, but I thought there might be a balance to be
struck between performance and simplicity. The exponential delay always struck me as odd.

If the preference is to leave it as is or modify as suggested I'm ok with that too.
Appreciate the input.

Regards,
Brad