Re: One (possible) x86 get_user_pages bug

From: Jan Beulich
Date: Thu Jan 27 2011 - 11:07:39 EST

>>> On 27.01.11 at 14:05, Xiaowei Yang <xiaowei.yang@xxxxxxxxxx> wrote:
> We created a scenario to reproduce the bug:
> ----------------------------------------------------------------
> // proc1/proc1.2 are 2 threads sharing one page table.
> // proc1 is the parent of proc2.
> proc1 proc2 proc1.2
> ... ... // in gup_pte_range()
> ... ... pte = gup_get_pte()
> ... ... page1 = pte_page(pte) // (1)
> do_wp_page(page1) ... ...
> ... exit_map() ...
> ... ... get_page(page1) // (2)
> -----------------------------------------------------------------
> do_wp_page() and exit_map() cause page1 to be released into free list
> before get_page() in proc1.2 is called. The longer the delay between
> (1)&(2), the easier the BUG_ON shows.

Other than responded initially, I don't this can happen outside
of Xen: do_wp_page() won't reach page_cache_release() when
gup_pte_range() is running for the same mm on another CPU,
since it can't get past ptep_clear_flush() (waiting for the CPU
in get_user_pages_fast() to re-enable interrupts).

> An experimental patch is made to prevent the PTE being modified in the
> middle of gup_pte_range(). The BUG_ON disappears afterward.
> However, from the comments embedded in gup.c, it seems deliberate to
> avoid the lock in the fast path. The question is: if so, how to avoid
> the above scenario?

Nick, based on your doing of the initial implementation, would
you be able to estimate whether disabling get_user_pages_fast()
altogether for Xen would be performing measurably worse than
adding the locks (but continuing to avoid acquiring mm->mmap_sem)
as suggested by Xiaowei? That's of course only if the latter is correct
at all, of which I haven't fully convinced myself yet.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at