Re: [RFC][PATCH v2 11/11] context_tracking,x86: Fix text_poke_sync() vs NOHZ_FULL
From: Marcelo Tosatti
Date: Tue Oct 26 2021 - 14:19:31 EST
On Thu, Oct 21, 2021 at 10:18:59PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 21, 2021 at 04:57:09PM -0300, Marcelo Tosatti wrote:
> > > Pretty much everything in noinstr is magical, we just have to think
> > > harder there (and possibly start writing more comments there).
> >
> > mds_user_clear_cpu_buffers happens after sync_core, in your patchset,
> > if i am not mistaken.
>
> Of course it does, mds_user_clear_cpu_buffers() is on exit, the
> sync_core() is on entry.
static_key enable/disable
__exit_to_user_mode -> context_tracking_set_cpu_work(cpu, work)
user_enter_irqoff -> preempt_disable();
__context_tracking_enter(CONTEXT_USER); seq = atomic_read(&ct->seq);
ct_seq_user_enter(raw_cpu_ptr(&context_tracking)); if (__context_tracking_seq_in_user(seq)) {
{ /* ctrl-dep */
arch_atomic_set(&ct->work, 0); atomic_or(work, &ct->work);
return arch_atomic_add_return(CT_SEQ_USER, &ct->seq); ret = atomic_try_cmpxchg(&ct->seq, &seq, seq|CT_SEQ_WORK);
}
} preempt_enable();
arch_exit_to_user_mode()
mds_user_clear_cpu_buffers(); <--- sync_core work queued,
but not executed.
i-cache potentially stale?
ct_seq_user_enter should happen _after_ all possible static_key users?
(or recheck that there is no pending work after any possible
rewritable code/static_key user).
>
> > > > > + /* NMI happens here and must still do/finish CT_WORK_n */
> > > > > + sync_core();
> > > >
> > > > But after the discussion with you, it seems doing the TLB checking
> > > > and (also sync_core) checking very late/very early on exit/entry
> > > > makes things easier to review.
> > >
> > > I don't know about late, it must happen *very* early in entry. The
> > > sync_core() must happen before any self-modifying code gets called
> > > (static_branch, static_call, etc..) with possible exception of the
> > > context_tracking static_branch.
> > >
> > > The TLBi must also happen super early, possibly while still on the
> > > entry stack (since the task stack is vmap'ed).
> >
> > But will it be ever be freed/remapped from other CPUs while the task
> > is running?
>
> Probably not, still something we need to be really careful with.
> >
> > > We currently don't run C
> > > code on the entry stack, that needs quite a bit of careful work to make
> > > happen.
> >
> > Was thinking of coding in ASM after (as early as possible) the write to
> > switch to kernel CR3:
>
> No, we're not going to add new feature to ASM. You'll just have to wait
> until all that gets lifted to C.
>
> > Kernel entry:
> > -------------
> >
> > cpu = smp_processor_id();
> >
> > if (isolation_enabled(cpu)) {
> > reqs = atomic_xchg(&percpudata->user_kernel_state, IN_KERNEL_MODE);
> > if (reqs & CPU_REQ_FLUSH_TLB)
> > flush_tlb_all();
> > if (reqs & CPU_REQ_SYNC_CORE)
> > sync_core();
> > }
> >
> > Exit to userspace (as close to write to CR3 with user pagetable
> > pointer):
> > -----------------
> >
> > cpu = smp_processor_id();
> >
> > if (isolation_enabled(cpu)) {
> > atomic_or(IN_USER_MODE, &percpudata->user_kernel_state);
> > }
> >
> > You think that is a bad idea (in ASM, not C) ?
>
> Those atomics are a bad idea and not goig to happen.
>
> > > We're not going to add an atomic to context tracking. There is one, we
> > > just got to extract/share it with RCU.
> >
> > Again, to avoid kernel TLB flushes you'd have to ensure:
>
> I know how it works, but we're not going to add a second atomic to
> entry/exit. RCU has one in there, that's going to be it. Again, we just
> got to extract/share.