Re: [BELATED CORE TOPIC] context tracking / nohz / RCU state

From: Paul E. McKenney
Date: Tue Aug 11 2015 - 17:47:40 EST

On Tue, Aug 11, 2015 at 12:07:54PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 11, 2015 at 11:33 AM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> >> This is a bit late, but here goes anyway.
> >>
> >> Having played with the x86 context tracking hooks for awhile, I think
> >> it would be nice if core code that needs to be aware of CPU context
> >> (kernel, user, idle, guest, etc) could come up with single,
> >> comprehensible, easily validated set of hooks that arch code is
> >> supposed to call.
> >>
> >> Currently we have:
> >>
> >> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
> >
> > Something about people yelling at me for waking up idle CPUs, thus
> > degrading their battery lifetimes. ;-)
> >
> >> - Context tracking hooks. Only used by some arches. Calling these
> >> calls the RCU hooks for you in most cases. They have weird
> >> interactions with interrupts and they're slow.
> >
> > Combining these would be good, but there are subtleties. For example,
> > some arches don't have context tracking, but RCU still needs to correctly
> > identify idle CPUs without in any way interrupting or awakening that CPU.
> > It would be good to make this faster, but it does have to work.
> Could we maybe have one set of old RCU-only (no context tracking)
> callbacks and a completely separate set of callbacks for arches that
> support full context tracking? The implementation of the latter would
> presumably call into RCU.

It should be possible for RCU to use context tracking if it is available
and to have RCU maintain its own state otherwise, if that is what you
are getting at. Assuming that the decision is global and made at either
build or boot time, anyway. Having some CPUs tracking context and others
not sounds like an invitation for subtle bugs.

> >> may_i_turn_off_ticks_right_now()
> >
> > This is RCU if CONFIG_RCU_FAST_NO_HZ=n.
> >
> >> or, better yet:
> >> i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
> >
> > This is RCU if CONFIG_RCU_FAST_NO_HZ=y. It would not be difficult to
> > make RCU do this if CONFIG_RCU_FAST_NO_HZ=n as well, but doing so would
> > increase to/from idle overhead.
> If things actually end up using hrtimers, we might also want
> get_off_my_lawn() aka "isolate this cpu now and try to do all the
> deferred stuff right now and kill off those hrtimers".

If too many different subsystems use hrtimers, then we might well
find ourselves worse off than if we used scheduler-clock interrupts.
I suppose we could have some way of multiplexing a single hrtimer,
which could be thought of as an on-demand scheduling-clock interrupts.

> Rik is (was?) trying to make some housekeeper CPU probe other CPUs'
> state to eliminate the need for exact vtime accounting and thus speed
> up transitions to/from user or idle. It would be really neat if we
> could simultaneously have quick idle/user transitions *and* avoid
> deferred per-cpu work interrupting idle/user mode.

Careful here! Rik's vtime accounting is allowed to be approximate.
Using approximate accounting for RCU is an excellent way to sharply
increase your kernel's life-insurance premiums.

> Chris Metcalf seems quite excited about the kernel staying far away
> from his CPU once he's ready :)

Completely understandable. But I suspect that if push were to come to
shove, he would be even more excited about his kernel not crashing.

> >> Some arches may need:
> >>
> >> i_am_lame_and_forgot_my_previous_context()
> >>
> >> x86 will soon (4.3 or 4.4, depending on how my syscall cleanup goes)
> >> no longer need that.
> >>
> >> Paul says that some arches need something that goes straight from IRQ
> >> to user mode (?) -- sigh.
> >
> > Straight from IRQ to process-level kernel mode. I ran into this in
> > late 2011, and clearly should have documented exactly what code was
> > doing this. Something about invoking system calls from within the
> > kernel on some architectures.
> >
> > Hey, if no architectures do this anymore, I could simplify RCU a bit! ;-)
> I wonder if whatever arches do this could do it in two steps: exit IRQ
> and then enter normal kernel mode.

That certainly would make RCU's life easier! No idea on feasibility
otherwise, though.

Thanx, Paul

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at