Re: [PATCH 2] Little fixes to previous futex patch
From: Jamie Lokier
Date: Thu Sep 04 2003 - 13:03:20 EST
Hugh Dickins wrote:
> In sys_remap_file_pages, you set the VM_NONLINEAR flag, then clear
> it if this particular population matches the vma. No, you cannot
> clear that flag once set, without checking every page and pte_file
> already set within the vma. Check if population matches vma first,
> and if it doesn't match just set the VM_NONLINEAR flag in that case.
> (Andrew already mentioned locking: I'd have said page_table_lock,
> but his mmap_sem is also appropriate: it's an odd case.)
I don't see why you can't clear the flag: the call to ->populate will
change every page and pte_file to correspond with the linear page
offsets, which is all that !VM_NONLINEAR indicates.
However, it _is_ wrong to clear VM_NONLINEAR before the call to
->populate() has finished, with Andrew's patch which uses
downgrade_write(). Instead, the clear must come after ->populate()
has finished.
> I think rip out the FIXADDR_USER_START bit, it's rather over-the-top,
> ugly: and that area is readonly, so not a useful place for a futex.
Agreed. I put it because the old futex has it as a side effect of
get_user_pages(). It can go.
> The units of keys[1]: bytes if private but pages if shared.
> That's okay for now I think, but if a hashing expert comes along
> later s/he'll probably want to change it. The current hash does
> add key1 to offset, which is okay: if it xor'ed you'd lose the
> the offset bits in the private case.
Feel free to think up a better hash that isn't slow. Two iterations
of hash_long() would be a good hash, but slower.
> Those keys[1] pages: in units of PAGE_SIZE in the linear case,
> of PAGE_CACHE_SIZE in the nonlinear case. Oh well, this is far
> from the only place with such an inconsistency, let's worry
> about that when never comes.
Ew.
> The err at the end of __get_page_keys would be 1 from successful
> get_user_pages, treated as error by the callers: need to make it 0.
Well spotted.
> futex_wait: I didn't get around to it in my version, so haven't
> thought through the issues, but I'm a bit worried that you get
> curval for -EWOULDBLOCK check without holding the futex_lock.
> That looks suspicious to me, but I'm going to be lazy and not
> try to think about it, because Rusty is sure to understand the
> races there. If that code is insufficient as you have it, may
> need __pin_page reinstated for just that case (hmm, was that
> get_user right before? I'd expect it to kmap_atomic pinned page.)
The important things are that the futex is queued prior to checking
curval, the requested page won't change (it's protected by mmap_sem),
and any parallel waker changes the word prior to waking us.
You made me notice a rather subtle memory ordering condition, though.
We must issue the read after queuing the futex. There needs to be a
smp_rmb() after queuing and before the read, because the spin_unlock()
barrier only constrains earlier reads, not later ones.
Thanks for all your great insights,
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/