Re: [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core
From: Peter Zijlstra
Date: Fri Jul 15 2011 - 04:44:31 EST
On Fri, 2011-07-15 at 16:38 +0800, MailingLists wrote:
> On 07/15/2011 04:20 PM, Peter Zijlstra wrote:
> > On Fri, 2011-07-15 at 16:07 +0800, Shan Hai wrote:
> >> The following test case could reveal a bug in the futex_lock_pi()
> >>
> >> BUG: On FUTEX_LOCK_PI, there is a infinite loop in the futex_lock_pi()
> >> on Powerpc e500 core.
> >> Cause: The linux kernel on the e500 core has no write permission on
> >> the COW page, refer the head comment of the following test code.
> >>
> >> ftrace on test case:
> >> [000] 353.990181: futex_lock_pi_atomic<-futex_lock_pi
> >> [000] 353.990185: cmpxchg_futex_value_locked<-futex_lock_pi_atomic
> >> [snip]
> >> [000] 353.990191: do_page_fault<-handle_page_fault
> >> [000] 353.990192: bad_page_fault<-handle_page_fault
> >> [000] 353.990193: search_exception_tables<-bad_page_fault
> >> [snip]
> >> [000] 353.990199: get_user_pages<-fault_in_user_writeable
> >> [snip]
> >> [000] 353.990208: mark_page_accessed<-follow_page
> >> [000] 353.990222: futex_lock_pi_atomic<-futex_lock_pi
> >> [snip]
> >> [000] 353.990230: cmpxchg_futex_value_locked<-futex_lock_pi_atomic
> >> [ a loop occures here ]
> >>
> >
> > But but but but, that get_user_pages(.write=1, .force=0) should result
> > in a COW break, getting our own writable page.
> >
> > What is this e500 thing smoking that this doesn't work?
>
> A page could be set to read only by the kernel (supervisor in the powerpc
> literature) on the e500, and that's what the kernel do. Set SW(supervisor
> write) bit in the TLB entry to grant write permission to the kernel on a
> page.
>
> And further the SW bit is set according to the DIRTY flag of the PTE,
> PTE.DIRTY is set in the do_page_fault(), the futex_lock_pi() disabled
> page fault, the PTE.DIRTY never can be set, so do the SW bit, unbreakable
> COW occurred, infinite loop followed.
I'm fairly sure fault_in_user_writeable() has PF enabled as it takes
mmap_sem, an pagefaul_disable() is akin to preemp_disable() on mainline.
Also get_user_pages() fully expects to be able to schedule, and in fact
can call the full pf handler path all by its lonesome self.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/