Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
From: Frederic Weisbecker
Date: Tue May 20 2014 - 12:24:46 EST
On Tue, May 20, 2014 at 08:53:24AM -0700, Paul E. McKenney wrote:
> On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote:
> > I'm not sure that I really understand what you want here.
> >
> > The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks
> > is actually off by default. This is only overriden by "nohz_full=" boot parameter.
>
> If I understand correctly, if there is no nohz_full= boot parameter,
> then the context-tracking code takes the early exit via the
> context_tracking_is_enabled() check in context_tracking_user_enter().
Exactly. It's even jump labeled. So it should, in the better arch support case,
resume to a single unconditional jump when it's off.
> I would not expect this to cause much in the way of syscall performance
> degradation.
Now the jump label concern all cases but syscalls (exceptions and irq). Syscalls
are even better off-case optimized with a TIF_NOHZ flag. So it goes down to the
slow path all-in-one condition. At least in x86.
> However, it looks like having even one CPU in nohz_full
> mode causes all CPUs to enable context tracking.
True unfortunately. It's necessary to track down syscalls and exceptions
entry exit across CPUs.
So if CPU 1 is full nohz and a task enters in userspace on CPU 0 and then migrates
to CPU 1, we must know there that it's resuming in userspace in order to stop the tick
confidently. So CPU 0 must do context tracking as well.
Of course one can argue that we can find out that the task is resuming in userspace from
CPU 0 scheduler entry without the need for previous context tracking, but I couldn't find safe
solution for that. This is because probing on user/kernel boundaries can only be done
in the soft way, throught explicit function calls. So there is an inevitable shift
between soft and hard boundaries, between what we probe and what we can guess.
>
> My guess is that Mike wants to have (say) half of his CPUs running
> nohz_full, and the other half having fast system calls. So my guess
> also is that he would like some way of having the non-nohz_full CPUs
> to opt out of the context-tracking overhead, including the memory
> barriers and atomic ops in rcu_user_enter() and rcu_user_exit(). ;-)
I see. So we could possibly restrict the context tracking to a bunch of
CPUs but only iff the tasks running there can't run on non-tracking CPUs.
Ah one possible thing is to rely on the NOHZ flag for that and check which
task needs to be tracked.
> > Now if what you need is to enable or disable it at runtime instead of boottime,
> > I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched
> > and RCU).
>
> What Frederic said! Making RCU deal with this is possible, but a bit on
> the complicated side. Given that I haven't heard too many people complaining
> that RCU is too simple, I would like to opt out of runtime changes to the
> nohz_full mask.
Agreed.
>
> > I've already been eyed by vulturous frozen sharks flying in circles above me lately
> > after a few overengineering visions.
>
> Nothing like the icy glare of a frozen shark, is there? ;-)
I think they were even three-eyed!!!
>
> > And given that the full nohz code is still in a baby shape, it's probably not the right
> > time to expand it that way. I haven't even yet heard about users who crossed the testing
> > stage of full nohz.
> >
> > We'll probably extend it that way in the future. But likely not in a near future.
>
> My guess is that Mike would be OK with making nohz_full choice of CPUs
> still at boot time, but that he would like the CPUs that are not to be
> in nohz_full state be able to opt out of the context-tracking overhead.
Ok that might be possible. Although still require a bit of complication.
Lets wait for Mike input.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/