Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry

From: Andy Lutomirski
Date: Thu May 07 2015 - 08:53:26 EST


On Thu, May 7, 2015 at 5:44 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>
>> > > We cannot take the lock_trace(task) from irq context, and we
>> > > probably do not need to anyway, since we do not care about a
>> > > precise stack trace for the task.
>> >
>> > So one worry with this and similar approaches of statistically
>> > detecting user mode would be the fact that on the way out to
>> > user-space we don't really destroy the previous call trace - we
>> > just pop off the stack (non-destructively), restore RIPs and are
>> > gone.
>> >
>> > We'll need that percpu flag I suspect.
>> >
>> > And once we have the flag, we can get rid of the per syscall RCU
>> > callback as well, relatively easily: with CMPXCHG (in
>> > synchronize_rcu()!) we can reliably sample whether a CPU is in
>> > user mode right now, while the syscall entry/exit path does not
>> > use any atomics, we can just use a simple MOV.
>> >
>> > Once we observe 'user mode', then we have observed quiescent state
>> > and synchronize_rcu() can continue. If we've observed kernel mode
>> > we can frob the remote task's TIF_ flags to make it go into a
>> > quiescent state publishing routine on syscall-return.
>> >
>>
>> How does that work?
>>
>> If the exit code writes the per-cpu flag and then checks
>> TIF_whatever, we need a barrier to avoid a race where we end up in
>> user mode without seeing the flag.
>
> No, the TIF_RCU_SYNC flag would be independent of the user-mode flag.
>
> Also, the user-mode flag would be changed from the 'kernel-mode' value
> to the 'user-mode' value after we do the regular TIF checks - just
> before we hit user-space.
>
> The TIF_RCU_QS thing is just a fancy way for synchronize_rcu() (being
> executed on some other CPU not doing RT work) to intelligently wait
> for the remote (RT work doing) CPU to finish executing kernel code,
> without polling or so.
>
> And yes, the TIF_RCU_QS handler on the remote CPU would have to
> execute an SMP barrier, before it allows an RCU quiescent state to
> pass. Note that the regular RCU code for quiescent state publishing
> already has that barrier, typically something like:
>
> this_cpu_inc(rcu_qs_ctr);
>
> That in itself is enough for synchronize_rcu(), it could do a small
> timing loop with short sleeps, until it waits for rcu_qs_ctr to
> increase.

I think one or both of us is missing something or we're just talking
about different things.

If the nohz/RT cpu is about to enter user mode and stay there for a
long time, it does:

this_cpu_inc(rcu_qs_ctr);

or whatever. Maybe we add:

this_cpu_set(rcu_state) = IN_USER;

or however it's spelled.

The remote CPU wants to check our state. If this happens just before
the IN_USER write or rcu_qs_ctr increment becomes visible, then it'll
think we're in kernel mode. Now it either has to poll (which is fine)
or try to get us to tell the RCU core when we become quiescent by
setting TIF_RCU_THINGY.

The problem is that I don't see how TIF_RCU_THINGY can work reliably.
If the remote CPU sets it, it'll be too late and we'll still enter
user mode without seeing it. If it's just an optimization, though,
then it should be fine.

I still feel like this is all overengineered. Shouldn't rcu_qs_ctr be
enough for all of the above?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/