Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier (v5)

From: Peter Zijlstra
Date: Wed Jan 20 2010 - 03:46:44 EST


On Tue, 2010-01-19 at 22:13 -0500, Mathieu Desnoyers wrote:
> * Peter Zijlstra (peterz@xxxxxxxxxxxxx) wrote:
> > On Tue, 2010-01-19 at 19:37 +0100, Peter Zijlstra wrote:
> > > On Thu, 2010-01-14 at 14:33 -0500, Mathieu Desnoyers wrote:
> > > > It's a case where CPU 1 switches from our mm to another mm:
> > > >
> > > > CPU 0 (membarrier) CPU 1 (another mm -our mm)
> > > > <user-space> <user-space>
> > > > <buffered access C.S. data>
> > > > urcu read unlock()
> > > > barrier()
> > > > store local gp
> > > > <kernel-space>
> > >
> > > OK, so the question is how we end up here, if its though interrupt
> > > preemption I think the interrupt delivery will imply an mb,
> >
> > I keep thinking that, but I think we actually refuted that in an earlier
> > discussion on this patch.
>
> Intel Architecture Software Developer's Manual Vol. 3: System
> Programming
> 7.4 Serializing Instructions
>
> "MOV to control reg, MOV to debug reg, WRMSR, INVD, INVLPG, WBINDV, LGDT,
> LLDT, LIDT, LTR, CPUID, IRET, RSM"
>
> So, this list does _not_ include: INT, SYSENTER, SYSEXIT.
>
> Only IRET is included. So I don't think it is safe to assume that x86
> has serializing instructions when entering/leaving the kernel.

I got confused by 7.1.2.1 automatic locking on interrupt acknowledge.

But I already retracted that stmt.

> >
> > > if its a
> > > blocking syscall, the set_task_state() mb [*] should be there.
> > >
> > > Then we also do:
> > >
> > > clear_tsk_need_resched()
> > >
> > > which is an atomic bitop (although does not imply a full barrier
> > > per-se).
> > >
> > > > rq->curr = next (1)
> >
> > We could possibly look at placing that assignment in context_switch()
> > between switch_mm() and switch_to(), which should provide a mb before
> > and after I think, Ingo?
>
> That's an interesting idea. It would indeed fix the problem of the
> missing barrier before the assignment, but would lack the appropriate
> barrier after the assignment. If the rq->curr = next; assignment is made
> after load_cr3, then we lack a memory barrier between the assignment and
> execution of following user-space code after returning with SYSEXIT (and
> we lack the appropriate barrier for other architectures too).

Well, 7.1.2.1 says that writing a segment register implies a LOCK, but
on second reading there are a number of qualifiers there, not sure we
satisfy that.

Peter, does our switch_to() imply a mb?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/