Re: PWM regression causing failures with the pwm-atmel driver

From: Uwe Kleine-König
Date: Mon May 22 2023 - 16:43:56 EST


Hello Peter,

On Mon, May 22, 2023 at 09:28:39PM +0200, Peter Rosin wrote:
> 2023-05-22 at 19:23, Uwe Kleine-König wrote:
> > On Mon, May 22, 2023 at 05:19:43PM +0200, Peter Rosin wrote:
> >> I have a device with a "sound card" that has an amplifier that needs
> >> an extra boost when high amplification is requested. This extra
> >> boost is controlled with a pwm-regulator.
> >>
> >> As of commit c73a3107624d ("pwm: Handle .get_state() failures") this
> >> device no longer works. I have tracked the problem to an unfortunate
> >> interaction between the underlying PWM driver and the PWM core.
> >>
> >> The driver is drivers/pwm/pwm-atmel.c which has difficulties getting
> >> the period and/or duty_cycle from the HW when the PWM is not enabled.
> >> Because of this, I think, the driver does not fill in .period and
> >> .duty_cycle at all in atmel_pwm_get_state() unless the PWM is enabled.
> >>
> >> However, the PWM core is not expecting these fields to be left as-is,
> >> at least not in pwm_adjust_config(), and its local state variable on
> >> the stack ends up with whatever crap was on the stack on entry for
> >> these fields. That fails spectacularly when the function continues to
> >> do math on these uninitialized values.

After looking again, I don't understand that part. Note that
pwm_get_state() doesn't call .get_state() at all. Also pwmchip_add()
zero initializes .state, then pwm_get() calls .get_state() (via
pwm_device_request() which is called in .xlate()) which (if the HW is
disabled) doesn't touch .period, so it should continue to be zero?!

So I wonder why your approach 2 works at all. Do you see what I'm
missing?

> >> In particular, I find this in the kernel log when a bad kernel runs:
> >> pwm-regulator: probe of reg-ana failed with error -22
> >>
> >> Before commit c73a3107624d this was a silent failure, and the situation
> >> "repaired itself" when the PWM was later reprogrammed, at least for my
> >> case. After that commit, the failure is fatal and the "sound card"
> >> fails to come up at all.
> >>
> >>
> >> I see a couple of adjustments that could be made.
> >>
> >> 1. Zero out some fields in the driver:
> >>
> >> @@ -390,4 +390,6 @@ static int atmel_pwm_get_state(struct pwm_chip *chip, struct pwm_device *pwm,
> >> state->enabled = true;
> >> } else {
> >> + state->period = 0;
> >> + state->duty_cycle = 0;
> >> state->enabled = false;
> >> }
> >
> > I don't particularily like that. While state->period is an invalid
> > value, IMHO enabled = false is enough information from the driver's POV.
>
> This was the preferred approach of Thierry, and given the number of
> call sites for pwm_get_state with a local variable, I can sympathize
> with that view.

I looked a bit more into the issue and think that pwm_get_state() isn't
problematic. pwm_get_state() fully assigns *state.

> At the same time, fixing drivers one by one is not
> a fun game, so I can certainly see that approach 3 also has an appeal.

What I don't like about it is that the output of a disabled PWM doesn't
have a period. There might be one configured in hardware(, and the
.get_state() callback might or might not return that), but the emitted
signal has no well-defined period.

> >> 2. Don't try to "adjust" a disabled PWM:
> >>
> >> @@ -656,7 +656,7 @@ int pwm_adjust_config(struct pwm_device *pwm)
> >> * In either case, we setup the new period and polarity, and assign a
> >> * duty cycle of 0.
> >> */
> >> - if (!state.period) {
> >> + if (!state.enabled || !state.period) {
> >> state.duty_cycle = 0;
> >> state.period = pargs.period;
> >> state.polarity = pargs.polarity;
> >
> > In my book code that calls pwm_get_state() should consider .period
> > (and .duty_cycle) undefined if .enabled is false. So this hunk is an
>
> I agree fully. This feels like a good hunk.
>
> > improvement. Having said that, I think pwm_adjust_config() has a strange
> > semantic. I don't understand when you would want to call it, and it only
> > has one caller (i.e. pwm-regulator).
>
> I believe it's for a case where some bootstrap/bootloader has configured
> a PWM in order to set up a regulator for some critical component, and the
> kernel needs to cake over the responsibility seamlessly, or everything
> will take a dive. I deduced that from the description of pwm_adjust_config:

If it wants to take over seamlessly, it shouldn't call pwm_apply_state()
at all?!

> * This function will adjust the PWM config to the PWM arguments provided
> * by the DT or PWM lookup table. This is particularly useful to adapt
> * the bootloader config to the Linux one.
>
> But what do I know?

> > So another option would be
> >
> > diff --git a/drivers/regulator/pwm-regulator.c b/drivers/regulator/pwm-regulator.c
> > index b64d99695b84..418aff0ddbed 100644
> > --- a/drivers/regulator/pwm-regulator.c
> > +++ b/drivers/regulator/pwm-regulator.c
> > @@ -368,10 +368,6 @@ static int pwm_regulator_probe(struct platform_device *pdev)
> > return ret;
> > }
> >
> > - ret = pwm_adjust_config(drvdata->pwm);
> > - if (ret)
> > - return ret;
> > -
> > regulator = devm_regulator_register(&pdev->dev,
> > &drvdata->desc, &config);
> > if (IS_ERR(regulator)) {
> >
> > and drop pwm_adjust_config() as a followup?!
>
> As described above, I think that might be too drastic?
>
> >> 3. Zero out the state before calling pwm_get_state:
> >>
> >> @@ -115,7 +115,7 @@ static int pwm_device_request(struct pwm_device *pwm, const char *label)
> >> }
> >>
> >> if (pwm->chip->ops->get_state) {
> >> - struct pwm_state state;
> >> + struct pwm_state state = { .enabled = true };
> >
> > I wonder why you picked .enabled = true here but = false on all other
> > code locations.
>
> Silly typo, soory.
>
> > Also you don't seem to have 1271a7b98e7989ba6bb978e14403fc84efe16e13
> > which has the same effect and is included in v6.3 and v6.2.13.
>
> Right, approach 3 felt so ugly and intrusive that I didn't bother to take
> it beyond the problematic commit (and hinted at that further down: "It also
> needs a rebase from the culprit commit"). Having said that, 1271a7b98e79 is
> not enough, or else I would never have noticed the problem in the first
> place. I don't think the hunk in pwm_dbg_show() makes any difference for my
> case since I heven't enabled debugging, and a double-check confirms it:
> After 1271a7b98e79, zeroing "state" in pwm_adjust_config() is enough to kill
> this regression.

That really puzzles me because pwm_get_state() fully overwrites state.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | https://www.pengutronix.de/ |

Attachment: signature.asc
Description: PGP signature