Re: [PATCH v14 1/2] pwm: add microchip soft ip corePWM driver

From: Conor Dooley
Date: Sat Mar 25 2023 - 07:52:53 EST


On Fri, Mar 24, 2023 at 06:43:03PM +0000, Conor Dooley wrote:
> On Thu, Mar 23, 2023 at 10:14:09AM +0100, Uwe Kleine-König wrote:
> > On Wed, Mar 22, 2023 at 10:49:40PM +0000, Conor Dooley wrote:
> > > On Wed, Mar 22, 2023 at 11:55:36AM +0100, Uwe Kleine-König wrote:
> > > > On Mon, Mar 06, 2023 at 09:48:58AM +0000, Conor Dooley wrote:
> > > > > Add a driver that supports the Microchip FPGA "soft" PWM IP core.
>
> > > > > +static void mchp_core_pwm_calc_period(const struct pwm_state *state, unsigned long clk_rate,
> > > > > + u16 *prescale, u16 *period_steps)
> > > > > +{
> > > > > + u64 tmp;
> > > > > +
> > > > > + /*
> > > > > + * Calculate the period cycles and prescale values.
> > > > > + * The registers are each 8 bits wide & multiplied to compute the period
> > > > > + * using the formula:
> > > > > + * (prescale + 1) * (period_steps + 1)
> > > > > + * period = -------------------------------------
> > > > > + * clk_rate
> > > > > + * so the maximum period that can be generated is 0x10000 times the
> > > > > + * period of the input clock.
> > > > > + * However, due to the design of the "hardware", it is not possible to
> > > > > + * attain a 100% duty cycle if the full range of period_steps is used.
> > > > > + * Therefore period_steps is restricted to 0xFE and the maximum multiple
> > > > > + * of the clock period attainable is 0xFF00.
> > > > > + */
> > > > > + tmp = mul_u64_u64_div_u64(state->period, clk_rate, NSEC_PER_SEC);
> > > > > +
> > > > > + /*
> > > > > + * The hardware adds one to the register value, so decrement by one to
> > > > > + * account for the offset
> > > > > + */
> > > > > + if (tmp >= MCHPCOREPWM_PERIOD_MAX) {
> > > > > + *prescale = MCHPCOREPWM_PRESCALE_MAX - 1;
> > > > > + *period_steps = MCHPCOREPWM_PERIOD_STEPS_MAX - 1;
> > > > > +
> > > > > + return;
> > > > > + }
> > > > > +
> > > > > + /*
> > > > > + * The optimal value for prescale can be calculated using the maximum
> > > > > + * permitted value of period_steps, 0xff.
> > > >
> > > > I had to think about that one for a while. The maximal value for
> > > > (period_steps + 1) is 0xff with the reasoning above?! That's also what
> > > > the code uses.
> > >
> > > I've become really disenfranchised with these register/variable names.
> > > I feel like just changing them to disconnect the variables used for
> > > calculation from the register names a little, so that the "is there a +1
> > > needed here or not" stuff becomes a little clearer.
> >
> > Full ack, I considered asking for that, but after some time I was in the
> > "I reviewed the patch today"-mode (which is quite similar to the mode
> > you described :-) and forgot. (Even in that mode the PREG_TO_VAL macro
> > annoyed me a bit.)
> >
> > > It always makes sense to be when I am in an "I respun the patch today"
> > > mode, but by the time we're in the review stage I get muddled.
> > > God forbid I have to look at this in 10 years time.
> > >
> > > That said, there is a bit of a mistake here. The comment two above says
> > > "Therefore period_steps is restricted to 0xFE" when I'm capping things
> > > off. Some inaccuracies have probably snuck in during the various
> > > respins, and I think the comment above is "correct" but misleading, as
> > > it muddies the waters about variable versus register names.
> >
> > I think it's sensible to only talk about either the register values or
> > the factors. I tend to think that talking about the register values is
> > easier at the end and recommend not to hide the +1 (or -1) in a macro.
>
> Yeah, I think the macro had value about 14 versions ago, but the number
> of users has dropped over the revisions.
> I think what I am going to to do for the next version is drop that
> macro, and only ever hold the registers values in variables.
> That had the advantage of making the maths in get_state() better match
> the comments that are now quite prevalent in the driver, as the +1s done
> there are more obvious.
> I'm loathe to link a git tree but, without changes to the period
> calculation logic, this is what it looks like w/ a consistent approach
> to what period_steps and prescale mean:
> https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/tree/drivers/pwm/pwm-microchip-core.c?h=pwm-dev-v15
> [I blindly pushed that before leaving work & without even building it, so
> there's probably some silly mistake in it, but that's besides the point
> of displaying variable/comment changes]
>
> From the chopped out bits of the previous email:
> > Consider
> >
> > clk_rate = 1000000
> > period = 64009000
> >
> > then your code gives:
> >
> > period * clk_rate
> > tmp = ----------------- = 64009
> > NSEC_PER_SEC
> >
> > and so *prescale = 251 and *period_steps = 253.
> >
> > Wasn't the intention to pick *prescale = 250 and then
> > *period_steps = 255?
> >
> > Depending on your semantics of "optimal", either (252, 252) or (250,
> > 255) is better than (251, 253). I think that means you shouldn't ignore
> > the -1?
>
> So, putting this one aside because it merits more thinking about.
>
> I went through and checked some arbitrary values of tmp, rather than
> dealing with "real" ones computed from frequencies I know are easily
> available for me to use in the FPGA bitstream I use to test this stuff.
>
> I think my "integer maths" truncation approach is not actually valid.
> Consider tmp = 255... *prescale gets computed as 255/(254 + 1) = 1, per my
> algorithm. Then, *period_steps is 255/(1 + 1) - 1 = 127.
> The period is (1 + 1)(127 + 1), which is not 255.
>
> Or take tmp = 510. prescale is 510/(254 + 1) = 2. period_steps is then
> 510/(2 + 1) - 1 = 169. period is (2 + 1)(169 + 1), which is 510. The
> calculated period is right, but that is not the "optimal" value!
>
> I think you're right in that I do actually need to consider the -1, and
> do a ceiling operation, when calculating prescale, IOW:
> *prescale = ceil(div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX + 1)) - 1;
> *period_steps = div_u64(tmp, *prescale + 1) - 1;
> [I know I can't actually call ceil()]
>
> That'd do the same thing as the truncation where
> div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX + 1) is not a round number,
> but it improves the round number case, eg tmp = 510:
> prescale = 510/(254 + 1) - 1 = 1, period_steps = 510/(1 + 1) - 1 = 254
> period = (1 + 1)(254 + 1) = 510
>
> It does mean a zero period would need to be special cased, but I don't
> mind that.
>
> > One thing I think is strange is that with clk_rate = 1000001 and your
> > algorithm we get:
> >
> > requested period = 1274998 ns -> real period = 1269998.73000127 (prescale = 4, period_steps = 253)
> > requested period = 1274999 ns -> real period = 1271998.728001272 (prescale = 5, period_steps = 211)
>
> This second one here, where tmp = 1275, is a good example of the above.
> With the ceil() change, this would be prescale = 4, period_steps = 254
> which I think makes more sense.
> >
> > while 1271998.728001272 would be a better match for a request with
> > period = 1274998 than 1269998.73000127.
> >
> > I spend too much time to think about that now. I'm unsure if this is
> > because the -1 is missing, or if there is a bug in the idea to pick a
> > small prescale to allow a big period_steps value (in combination with
> > the request to pick the biggest possible period).
>
> With the inconsistency fixed, I think getting the slightly less accurate
> period is a byproduct of prioritising the finegrainedness of the duty
> cycle.
>
> > Hmm, maybe you understand that better than me? I'll have to think about
> > it.
>
> [end of snip from the last mail]
>
> >
> > Having said that here are my results of thinking a bit about how to
> > choose register values:
> >
> > Let r(p) denote the period that is actually configured if p is
> > requested.
> >
> > The nice properties we want (i.e. those a consumer might expect?) are:
> >
> > a) ∀p: r(p) ≤ p
> > i.e. never configure a bigger period than requested
> > (This is a bit arbitrary, but nice to get a consistent behaviour for
> > all drivers and easier to handle than requesting the nearest
> > possible period.)
> >
> > b) ∀p, q: p ≤ q ⟹ r(p) ≤ r(q)
> > i.e. monotonicity
> >
> > c) ∀p: r(roundup(r(p))) = r(p)
> > i.e. idempotency
> >
> > d) ∀p, q: r(p) ≤ q ⟹ r(p) ≤ r(q)
> > i.e. pick the biggest possible period
> >
> > (Sidenote: d) and a) imply b) and c))
> >
> > If you always set period_steps to 0xfe
> > (in Python syntax:
> >
> > tmp = p * clkrate // NSPS
> > period_steps = 0xfe
> > prescale = tmp // (period_steps + 1) - 1
> >
> > ) you get this consistent behaviour.
> >
> > This has the upside of being easy to implement and cheap to run.
> > Downside is that it's coarse and fails to implement periods that would
> > require e.g prescale = 0 and period_steps < 0xfe. As period_steps is
> > big, the domain to chose the duty cycle from is good.
>
> I want to maintain support for prescale = 0, so I'm not really
> interested in a computation that forsakes that.
>
> > If you pick period_steps and prescale such that
> > (period_steps + 1) * (prescale + 1)
> > is the maximal value that makes r(p) ≤ p you have an increased runtime
> > because you have to test multiple values for period_steps. I don't think
> > there is an algorithm without a loop to determine an optimal pair?!
> > Usually this gives a better match that the easy algorithm above for the
> > period, but the domain for duty_cycle is (usually) smaller.
> > I think we have a) and d) here, so that's good.
> >
> > I think for most consumers a big domain for duty_cycle is more important
> > that a good match for the requested period. So I tend to recommend the
> > easy algorithm, but I'm not religious about that and open for other
> > opinion and reasoning.
>
> I'll be honest and say that I am getting a bit fatigued with the way
> that issues w/ the calculations keep cropping up. I'll put a bit of time
> into trying to figure out how to fix the tmp = 6400900 case that you
> mentioned above, but if it comes out in the wash that that is a facet of
> the way this stuff is computed, then I might be inclined to forge ahead
> without resolving it... I'd certainly want to explain in a comment
> somewhere (limitations section?) the point at which it starts getting
> less accurate though. I'll write a script to iterate through the values
> of tmp & see if the reason is obvious.

I threw together a python script to go through the various values of tmp
& check which ones ended up with these "non-optimal" prescale values.
For tmp < 1000, I saw:
tmp: 511 | 255 - period_steps: 86
tmp: 766 | 255 - period_steps: 65
tmp: 767 | 255 - period_steps: 65
That then grows as tmp moves upwards, so:
tmp: 1021 | 255 - period_steps: 52
tmp: 1022 | 255 - period_steps: 52
tmp: 1023 | 255 - period_steps: 52

That looks oddly like (tmp % (254 + 1)) < prescale and seems to be an
"artifact" of using 254 to calculate the prescaler rather than 255.
I think that that can be worked around while keeping the current simple
approach to calculating prescale/period_steps. Looks like 32385 out of
the 65280 valid values for tmp can benefit! That's in comparison to the
127 that would produce the invalid period_steps value of 255.

I'll go submit something fixing this next week then I suppose.

> I would like to keep (something resembling) the current simple
> implementation of the period calculation, as simplicity is a significant
> advantage! I do also like the strategy of trying to maximise the number
> of available duty cycle steps, I think it makes perfect sense to make
> that the goal.
>
> Thanks again for spending time on this Uwe,
> Conor.
>


Attachment: signature.asc
Description: PGP signature