Re: PWM regression causing failures with the pwm-atmel driver

From: Peter Rosin
Date: Mon May 22 2023 - 20:39:17 EST


Hi!

2023-05-22 at 22:43, Uwe Kleine-König wrote:
> Hello Peter,
>
> On Mon, May 22, 2023 at 09:28:39PM +0200, Peter Rosin wrote:
>> 2023-05-22 at 19:23, Uwe Kleine-König wrote:
>>> On Mon, May 22, 2023 at 05:19:43PM +0200, Peter Rosin wrote:
>>>> I have a device with a "sound card" that has an amplifier that needs
>>>> an extra boost when high amplification is requested. This extra
>>>> boost is controlled with a pwm-regulator.
>>>>
>>>> As of commit c73a3107624d ("pwm: Handle .get_state() failures") this
>>>> device no longer works. I have tracked the problem to an unfortunate
>>>> interaction between the underlying PWM driver and the PWM core.
>>>>
>>>> The driver is drivers/pwm/pwm-atmel.c which has difficulties getting
>>>> the period and/or duty_cycle from the HW when the PWM is not enabled.
>>>> Because of this, I think, the driver does not fill in .period and
>>>> .duty_cycle at all in atmel_pwm_get_state() unless the PWM is enabled.
>>>>
>>>> However, the PWM core is not expecting these fields to be left as-is,
>>>> at least not in pwm_adjust_config(), and its local state variable on
>>>> the stack ends up with whatever crap was on the stack on entry for
>>>> these fields. That fails spectacularly when the function continues to
>>>> do math on these uninitialized values.
>
> After looking again, I don't understand that part. Note that
> pwm_get_state() doesn't call .get_state() at all. Also pwmchip_add()
> zero initializes .state, then pwm_get() calls .get_state() (via
> pwm_device_request() which is called in .xlate()) which (if the HW is
> disabled) doesn't touch .period, so it should continue to be zero?!
>
> So I wonder why your approach 2 works at all. Do you see what I'm
> missing?

I tried various things on top of v6.3 and you are right. My hunks do
not make a (significant) difference there.

So, I took a step back and can only conclude that there must be some
another regression to find, and I was confused by that other regression.
In short, I was on 6.1.<foo> and everything was fine, and then I bumped
to 6.3 and a process crashed. I went to 6.2 and that same process also
crashed. I then totally focused on v6.1..v6.2 to figure out the problem.
I simply assumed v6.3 had the same problem because the symptom from
30.000ft was the same (that process died). I failed to go back to v6.3
to confirm that it was indeed the same problem as I had found in the
v6.1..v6.2 range.

My bad, it seems I have another day of bisections lined up.

>>>> In particular, I find this in the kernel log when a bad kernel runs:
>>>> pwm-regulator: probe of reg-ana failed with error -22
>>>>
>>>> Before commit c73a3107624d this was a silent failure, and the situation
>>>> "repaired itself" when the PWM was later reprogrammed, at least for my
>>>> case. After that commit, the failure is fatal and the "sound card"
>>>> fails to come up at all.
>>>>
>>>>
>>>> I see a couple of adjustments that could be made.
>>>>
>>>> 1. Zero out some fields in the driver:
>>>>
>>>> @@ -390,4 +390,6 @@ static int atmel_pwm_get_state(struct pwm_chip *chip, struct pwm_device *pwm,
>>>> state->enabled = true;
>>>> } else {
>>>> + state->period = 0;
>>>> + state->duty_cycle = 0;
>>>> state->enabled = false;
>>>> }
>>>
>>> I don't particularily like that. While state->period is an invalid
>>> value, IMHO enabled = false is enough information from the driver's POV.
>>
>> This was the preferred approach of Thierry, and given the number of
>> call sites for pwm_get_state with a local variable, I can sympathize
>> with that view.
>
> I looked a bit more into the issue and think that pwm_get_state() isn't
> problematic. pwm_get_state() fully assigns *state.

Right, the only way pwm_get_state() can be problematic is if bogus values
have made their way into pwm->state earlier. Which is what happened for
v6.2.

>> At the same time, fixing drivers one by one is not
>> a fun game, so I can certainly see that approach 3 also has an appeal.
>
> What I don't like about it is that the output of a disabled PWM doesn't
> have a period. There might be one configured in hardware(, and the
> .get_state() callback might or might not return that), but the emitted
> signal has no well-defined period.
>
>>>> 2. Don't try to "adjust" a disabled PWM:
>>>>
>>>> @@ -656,7 +656,7 @@ int pwm_adjust_config(struct pwm_device *pwm)
>>>> * In either case, we setup the new period and polarity, and assign a
>>>> * duty cycle of 0.
>>>> */
>>>> - if (!state.period) {
>>>> + if (!state.enabled || !state.period) {
>>>> state.duty_cycle = 0;
>>>> state.period = pargs.period;
>>>> state.polarity = pargs.polarity;
>>>
>>> In my book code that calls pwm_get_state() should consider .period
>>> (and .duty_cycle) undefined if .enabled is false. So this hunk is an
>>
>> I agree fully. This feels like a good hunk.
>>
>>> improvement. Having said that, I think pwm_adjust_config() has a strange
>>> semantic. I don't understand when you would want to call it, and it only
>>> has one caller (i.e. pwm-regulator).
>>
>> I believe it's for a case where some bootstrap/bootloader has configured
>> a PWM in order to set up a regulator for some critical component, and the
>> kernel needs to cake over the responsibility seamlessly, or everything
>> will take a dive. I deduced that from the description of pwm_adjust_config:
>
> If it wants to take over seamlessly, it shouldn't call pwm_apply_state()
> at all?!
>
>> * This function will adjust the PWM config to the PWM arguments provided
>> * by the DT or PWM lookup table. This is particularly useful to adapt
>> * the bootloader config to the Linux one.
>>
>> But what do I know?
>
>>> So another option would be
>>>
>>> diff --git a/drivers/regulator/pwm-regulator.c b/drivers/regulator/pwm-regulator.c
>>> index b64d99695b84..418aff0ddbed 100644
>>> --- a/drivers/regulator/pwm-regulator.c
>>> +++ b/drivers/regulator/pwm-regulator.c
>>> @@ -368,10 +368,6 @@ static int pwm_regulator_probe(struct platform_device *pdev)
>>> return ret;
>>> }
>>>
>>> - ret = pwm_adjust_config(drvdata->pwm);
>>> - if (ret)
>>> - return ret;
>>> -
>>> regulator = devm_regulator_register(&pdev->dev,
>>> &drvdata->desc, &config);
>>> if (IS_ERR(regulator)) {
>>>
>>> and drop pwm_adjust_config() as a followup?!
>>
>> As described above, I think that might be too drastic?
>>
>>>> 3. Zero out the state before calling pwm_get_state:
>>>>
>>>> @@ -115,7 +115,7 @@ static int pwm_device_request(struct pwm_device *pwm, const char *label)
>>>> }
>>>>
>>>> if (pwm->chip->ops->get_state) {
>>>> - struct pwm_state state;
>>>> + struct pwm_state state = { .enabled = true };
>>>
>>> I wonder why you picked .enabled = true here but = false on all other
>>> code locations.
>>
>> Silly typo, soory.
>>
>>> Also you don't seem to have 1271a7b98e7989ba6bb978e14403fc84efe16e13
>>> which has the same effect and is included in v6.3 and v6.2.13.
>>
>> Right, approach 3 felt so ugly and intrusive that I didn't bother to take
>> it beyond the problematic commit (and hinted at that further down: "It also
>> needs a rebase from the culprit commit"). Having said that, 1271a7b98e79 is
>> not enough, or else I would never have noticed the problem in the first
>> place. I don't think the hunk in pwm_dbg_show() makes any difference for my
>> case since I heven't enabled debugging, and a double-check confirms it:
>> After 1271a7b98e79, zeroing "state" in pwm_adjust_config() is enough to kill
>> this regression.
>
> That really puzzles me because pwm_get_state() fully overwrites state.

I only verified that I had no problem /with/ my change. See above.

Sorry for wasting time.

Cheers,
Peter

>
> Best regards
> Uwe
>