Re: One (possible) x86 get_user_pages bug

From: Jan Beulich
Date: Thu Jan 27 2011 - 09:49:53 EST


>>> On 27.01.11 at 14:05, Xiaowei Yang <xiaowei.yang@xxxxxxxxxx> wrote:
> We created a scenario to reproduce the bug:
> ----------------------------------------------------------------
> // proc1/proc1.2 are 2 threads sharing one page table.
> // proc1 is the parent of proc2.
>
> proc1 proc2 proc1.2
> ... ... // in gup_pte_range()
> ... ... pte = gup_get_pte()
> ... ... page1 = pte_page(pte) // (1)
> do_wp_page(page1) ... ...
> ... exit_map() ...
> ... ... get_page(page1) // (2)
> -----------------------------------------------------------------
>
> do_wp_page() and exit_map() cause page1 to be released into free list
> before get_page() in proc1.2 is called. The longer the delay between
> (1)&(2), the easier the BUG_ON shows.

The scenario indeed seems to apply independent of virtualization,
but the window obviously can be unbounded unless running
native.

However, going through all the comments in gup.c again I wonder
whether pv Xen guests don't violate the major assumption: There
is talk about interrupts being off preventing (or sufficiently
deferring) remote CPUs doing TLB flushes. In pv Xen guests,
however, non-local TLB flushes do not happen by sending IPIs -
the hypercall interface gets used instead. If that's indeed the
case, I would have expected quite a few bug reports, but I'm
unaware of any - Nick, am I overlooking something here?

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/