Re: [PATCH 2/2] tick: SHUTDOWN event-dev if no events are required for KTIME_MAX
From: Viresh Kumar
Date: Mon May 12 2014 - 01:35:28 EST
Thanks for blasting me off, it might be very helpful going forward :)
On 10 May 2014 01:39, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Fri, 9 May 2014, Viresh Kumar wrote:
>> diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c
>> int tick_program_event(ktime_t expires, int force)
>> {
>> struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
>> + int ret = 0;
>>
>> - return clockevents_program_event(dev, expires, force);
>> + /* Shut down event device if it is not required for long */
>> + if (unlikely(expires.tv64 == KTIME_MAX)) {
>> + dev->last_mode = dev->mode;
>> + clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
>
> No, we are not doing a state change behind the scene and a magic
> restore. And I know at least one way to make this fall flat on its
> nose, because you are blindly doing dev->last_mode = dev->mode on
> every invocation. So if that gets called twice without a restore in
> between, the device is going to be in shutdown mode forever.
During my tests I had this as well:
if (unlikely(expires.tv64 == KTIME_MAX)) {
+ WARN_ON(dev->mode == CLOCK_EVT_MODE_SHUTDOWN);
But it never got to it and I thought it might never happen, so removed it.
But yes, there should be some check here for that.
> It's moronic anyway as the clock event device has the state
> CLOCK_EVT_MODE_ONESHOT if its active, otherwise we would not be in
> that code path.
Yeah, Missed that earlier.
> But what's even worse: you just define that it's the best way for all
> implementations of clockevents to handle this.
>
> It's definitley NOT. Some startup/shutdown implementations are rather
> complex, so that would burden them with rather big latencies and some
> of them will even outright break.
>
> There is a world outside of YOUR favourite subarch.
:)
> We do not hijack stuff just because we can and it works on some
> machines. We think about it proper.
Agreed..
> If we hijack some existing facility then we audit ALL implementation
> sites and document that we did so and why we are sure that it won't
> break stuff. It still might break some oddball case, but that's not a
> big issue.
Because SHUTDOWN was an existing old API, I thought it will work
without breaking stuff. Yes, I must have done some auditing or made
this an RFC series atleast to get the discussion going forward..
> In the clockevents case we do not even need a new interface, but this
> must be made OPT-in and not a flagday change for all users.
>
> And no we are not going to abuse a feature flag for this. It's not a
> feature.
Okay.
> I'd rather have a new state for this, simply because it is NOT
> shutdown. It is in ONESHOT_STOPPED state. Whether a specific
> implementation will use the SHUTDOWN code for it or not does not
> matter.
Correct.
> That requires a full tree update of all implementations because most
> of them have a switch case for the mode. And adding a state will cause
> all of them which do not have a default clause to omit warnings
> because the mode is an enum for this very reason.
>
> And even if all of them would have a default clause, you'd need a way
> to OPT-In, because some of the defaults have a BUG() in there. Again,
> no feature flag exclusion. See above.
Okay..
> So the right thing to do this is:
>
> 1A) Change the prototype of the set_mode callback to return int and
> fixup all users. Either add the missing default clause or remove
> the existing BUG()/ pr_err()/whatever handling in the existing
> default clause and return a UNIQUE error code.
>
> I know I should have done that from the very beginning, but in
> hindsight one could have done everything better.
>
> coccinelle is your friend (if you need help ask me or Julia
> Lawall). But it's going to be quite some manual work on top.
Sure.
> 1B) Audit the changes and look at the implementations. If the patch is
> just adding the default clause or replacing some BUG/printk error
> handling goto #1C
>
> If it looks like it needs some preparatory care or if you find
> bugs in a particular implementation, roll back the changes and do
> the bug fixes and preparatory changes first as separate patches.
>
> Go back to #1A until the coccinelle patches are just squeaky
> clean.
>
> 1C) Add proper error handling for the various modes to the set_mode
> callback call sites, only two AFAIK.
>
> 2A) Add a new mode ONESHOT_STOPPED. That's safe now as all error
> handling will be done in the core code.
>
> 2B) Implement the ONESHOT_STOPPED logic and make sure all of the core
> code is aware of it.
Okay..
> And don't tell me it can't be done.
No way :)
> I've done it I don't know how many
> times with interrupts, timers, locking and some more. It's hard work,
> but it's valuable and way better than the brainless "make it work for
> me" hackery.
I didn't mean that actually. I just pin pointed how badly things can go
with an example of ARM's platform. But I never meant that it must get
in as it "works for me" :) .. But yes, you got that impression and I need
to make sure it doesn't happen again.
> You asked me yesterday about your other hrtimer patches. You know why
> I do not come around to review them? Because I have found way too much
> half baken stuff in your patches I reviewed so far.
Hmm, that's bad. I thought most of them wouldn't make any difference
functionally, and so wouldn't break anything. Sorry about that.
> That forces me to
> go through all of them with a fine comb and I simply do not scale.
>
> Alone reviewing this patch took me several couple of hours, because I
> had to think through the implications and stare into the code. And you
> know why? Because, first of all I do not trust your patches and
I will try my best to come over that :)
> secondly your changelogs (especially the one of the 1/2 patch) told me
> clearly, that this is "works for me" hackery.
I really didn't meant that :(
> So YOU forced me to spend time on looking at the consequences all over
> the place instead of YOU had looked in the first place and figured it
> out yourself.
>
> Did you look at ALL implementations of clock events when you made that
> change? Definitely NOT.
No, I didn't ... Yeah, I should have handled it in a better way.. With some
more study and work..
> I did. And found quite some of them which are going to be hurt. I also
> found some of them which are buggy.
>
> Just get it. This is CORE code and it affects ALL of its users. You
> can play that "hack it into submission game" with a random driver, i.e
> at the end of the callchain, but core code is very very differrent.
>
> There is always the risk to break something when you work on core code
> and nobody will rip your head off, if you break something because you
> did not notice the random oddity of some use site.
>
> But breaking stuff wholesale by just not thinking about it carefully
> won't earn you any brownie points.
Agreed.
> Vs. your other pending patches, I have no idea whether I have the time
> and the stomach to go through them before I vanish to Japan next
> weekend.
>
> If there are urgent bugfixes, which are obvious or proper thought
> through and explained, please resend them.
Only one as far as I remember and I already got a go ahead from you
on that, will resend it.
Let me get your next mail in here as well:
> There is even a better way to do that:
>
> 1) Create a new callback set_state() which has an
> int return value.
>
> 2) Make the callsites do
>
> if (dev->set_state) {
> ret = dev->set_state();
> handle_return_value();
> } else
> dev->set_mode();
>
> 3) Convert implementations one by one to use the new callback
>
> 4) Remove the set_mode callback
>
> 5) Implement new features.
Yeah, this is obviously going to be far more easy as there is less risk
of breaking things here :)
Again, sorry for the noise (Atleast the issue was real and important). I
wanted to do it in a better way but thought the existing API should work
smoothly..
I will do my best to earn your trust :)
Thanks..
Viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/