Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry
From: Ingo Molnar
Date: Thu May 07 2015 - 08:44:50 EST
* Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > > We cannot take the lock_trace(task) from irq context, and we
> > > probably do not need to anyway, since we do not care about a
> > > precise stack trace for the task.
> >
> > So one worry with this and similar approaches of statistically
> > detecting user mode would be the fact that on the way out to
> > user-space we don't really destroy the previous call trace - we
> > just pop off the stack (non-destructively), restore RIPs and are
> > gone.
> >
> > We'll need that percpu flag I suspect.
> >
> > And once we have the flag, we can get rid of the per syscall RCU
> > callback as well, relatively easily: with CMPXCHG (in
> > synchronize_rcu()!) we can reliably sample whether a CPU is in
> > user mode right now, while the syscall entry/exit path does not
> > use any atomics, we can just use a simple MOV.
> >
> > Once we observe 'user mode', then we have observed quiescent state
> > and synchronize_rcu() can continue. If we've observed kernel mode
> > we can frob the remote task's TIF_ flags to make it go into a
> > quiescent state publishing routine on syscall-return.
> >
>
> How does that work?
>
> If the exit code writes the per-cpu flag and then checks
> TIF_whatever, we need a barrier to avoid a race where we end up in
> user mode without seeing the flag.
No, the TIF_RCU_SYNC flag would be independent of the user-mode flag.
Also, the user-mode flag would be changed from the 'kernel-mode' value
to the 'user-mode' value after we do the regular TIF checks - just
before we hit user-space.
The TIF_RCU_QS thing is just a fancy way for synchronize_rcu() (being
executed on some other CPU not doing RT work) to intelligently wait
for the remote (RT work doing) CPU to finish executing kernel code,
without polling or so.
And yes, the TIF_RCU_QS handler on the remote CPU would have to
execute an SMP barrier, before it allows an RCU quiescent state to
pass. Note that the regular RCU code for quiescent state publishing
already has that barrier, typically something like:
this_cpu_inc(rcu_qs_ctr);
That in itself is enough for synchronize_rcu(), it could do a small
timing loop with short sleeps, until it waits for rcu_qs_ctr to
increase.
> I think the right solution is to accept that race and just have the
> RCU code send an IPI (or check again) if it sees too long of a
> period elapse in kernel mode.
I don't think there's any need for an IPI.
> I think the flag should be a counter, though. That way a workload
> that makes lots of syscalls will be quickly detected as going
> through quiescent states even if it's never actually observed in
> user mode.
Flag write is easier on the CPU than an INC/DEC: only a store to a
percpu location, no load needed, and no value dependencies. The store
will be program ordered, so it's a spin_unlock() work-alike.
> > The only hard requirement of this scheme from the RCU
> > synchronization POV is that all kernel contexts that may touch RCU
> > state need to flip this flag reliably to 'kernel mode': i.e. all
> > irq handlers, traps, NMIs and all syscall variants need to do
> > this.
> >
> > But once it's there, it's really neat.
>
> We already have to do this with the current code. I went through
> and checked all of the IST entries a couple versions ago.
Yeah - I just mean if the primary flag is _not_ TIF driven (which is
my suggestion, and which I thought you suggested as well) then all
code paths need to be covered.
Things like the irq-tracking callbacks are debugging and
instrumentation code - the user/kernel mode flag would be a hard
requirement on RCU correctness: missing a flag update might cause the
kernel to crash in non-obvious ways.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/