Re: [BELATED CORE TOPIC] context tracking / nohz / RCU state

From: Andy Lutomirski
Date: Tue Aug 11 2015 - 15:08:18 EST

On Tue, Aug 11, 2015 at 11:33 AM, Paul E. McKenney
<paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
>> This is a bit late, but here goes anyway.
>> Having played with the x86 context tracking hooks for awhile, I think
>> it would be nice if core code that needs to be aware of CPU context
>> (kernel, user, idle, guest, etc) could come up with single,
>> comprehensible, easily validated set of hooks that arch code is
>> supposed to call.
>> Currently we have:
>> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
> Something about people yelling at me for waking up idle CPUs, thus
> degrading their battery lifetimes. ;-)
>> - Context tracking hooks. Only used by some arches. Calling these
>> calls the RCU hooks for you in most cases. They have weird
>> interactions with interrupts and they're slow.
> Combining these would be good, but there are subtleties. For example,
> some arches don't have context tracking, but RCU still needs to correctly
> identify idle CPUs without in any way interrupting or awakening that CPU.
> It would be good to make this faster, but it does have to work.

Could we maybe have one set of old RCU-only (no context tracking)
callbacks and a completely separate set of callbacks for arches that
support full context tracking? The implementation of the latter would
presumably call into RCU.

>> may_i_turn_off_ticks_right_now()
>> or, better yet:
>> i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
> This is RCU if CONFIG_RCU_FAST_NO_HZ=y. It would not be difficult to
> make RCU do this if CONFIG_RCU_FAST_NO_HZ=n as well, but doing so would
> increase to/from idle overhead.

If things actually end up using hrtimers, we might also want
get_off_my_lawn() aka "isolate this cpu now and try to do all the
deferred stuff right now and kill off those hrtimers".

Rik is (was?) trying to make some housekeeper CPU probe other CPUs'
state to eliminate the need for exact vtime accounting and thus speed
up transitions to/from user or idle. It would be really neat if we
could simultaneously have quick idle/user transitions *and* avoid
deferred per-cpu work interrupting idle/user mode.

Chris Metcalf seems quite excited about the kernel staying far away
from his CPU once he's ready :)

>> Some arches may need:
>> i_am_lame_and_forgot_my_previous_context()
>> x86 will soon (4.3 or 4.4, depending on how my syscall cleanup goes)
>> no longer need that.
>> Paul says that some arches need something that goes straight from IRQ
>> to user mode (?) -- sigh.
> Straight from IRQ to process-level kernel mode. I ran into this in
> late 2011, and clearly should have documented exactly what code was
> doing this. Something about invoking system calls from within the
> kernel on some architectures.
> Hey, if no architectures do this anymore, I could simplify RCU a bit! ;-)

I wonder if whatever arches do this could do it in two steps: exit IRQ
and then enter normal kernel mode.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at