Re: Question about x86/mm/gup.c's use of disabled interrupts

From: Nick Piggin
Date: Wed Mar 18 2009 - 21:32:53 EST


Hi Jeremy,

I think you got most of your questions already hashed out, but
I could make a suggestion...

On Thursday 19 March 2009 06:17:03 Jeremy Fitzhardinge wrote:
> Hi Nick,
>
> The comment in arch/x86/mm/gup.c:gup_get_pte() says:
>
> [...] What
> * we do have is the guarantee that a pte will only either go from not
> * present to present, or present to not present or both -- it will not
> * switch to a completely different present page without a TLB flush in
> * between; something that we are blocking by holding interrupts off.
>
>
> Disabling the interrupt will prevent the tlb flush IPI from coming in
> and flushing this cpu's tlb, but I don't see how it will prevent some
> other cpu from actually updating the pte in the pagetable, which is what
> we're concerned about here.

Yes, I don't believe it is possible to have a *new* pte installed until
the flush is done.


> Is this the only reason to disable
> interrupts? Would we need to do it for the !PAE cases?

It has to pin page tables, and pin pages as well.


> Also, assuming that disabling the interrupt is enough to get the
> guarantees we need here, there's a Xen problem because we don't use IPIs
> for cross-cpu tlb flushes (well, it happens within Xen). I'll have to
> think a bit about how to deal with that, but I'm thinking that we could
> add a per-cpu "tlb flushes blocked" flag, and maintain some kind of
> per-cpu deferred tlb flush count so we can get around to doing the flush
> eventually.
>
> But I want to make sure I understand the exact algorithm here.

FWIW, powerpc actually can flush tlbs without IPIs, and it also has
a gup_fast. powerpc RCU frees its page _tables_ so we can walk them,
and then I use speculative page references in order to be able to
take a reference on the page without having it pinned.

Turning gup_get_pte into a pvop would be a bit nasty because on !PAE
it is just a single load, and even on PAE it is pretty cheap.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/