Re: Shared futexes (was [PATCH] FUTEX : new PRIVATE futexes)

From: Hugh Dickins
Date: Fri Apr 06 2007 - 09:03:39 EST


On Fri, 6 Apr 2007, Peter Zijlstra wrote:
>
> some thoughts on shared futexes;
>
> Could we get rid of the mmap_sem on the shared futexes in the following
> manner:
>
> - do a page table walk to find the pte;

("walk" meaning descent down the levels, I presume, rather than across)

I've not had time to digest your proposal, and I'm about to go out:
let me sound a warning that springs to mind, maybe it's entirely
inapproriate, but better said than kept silent.

It looks as if you're supposing that mmap_sem is needed to find_vma,
but not for going down the pagetables. It's not a simple as that:
you need to be careful that a concurrent munmap from another thread
isn't freeing pagetables from under you.

Holding (down_read) of mmap_sem is one way to protect against that.
try_to_unmap doesn't have that luxury: in its case, it's made safe
by the way free_pgtables does anon_vma_unlink and unlink_file_vma
before freeing any pagetables, so try_to_unmap etc. won't get there;
but you can't do that.

Hugh

> - get a page using pfn_to_page (skipping VM_PFNMAP)
> - get the futex key from page->mapping->host and page->index
> and offset from addr % PAGE_SIZE.
>
> or given a key:
>
> - lookup the page from key.shared.inode->i_mapping by key.shared.pgoff
> possibly loading the page using mapping->a_ops->readpage().
>
> then:
>
> - perform the futex operation on a kmap of the page
>
>
> This should all work except for VM_PFNMAP.
>
> Since the address is passed from userspace we cannot trust it to not
> point into a VM_PFNMAP area.
>
> However, with the RCU VMA lookup patches I'm working on we could do that
> check without holding locks and without exclusive cachelines; the
> question is, is that good enough?
>
> Or is there an alternative way of determining a pfnmap given a
> pfn/struct page?
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/