Re: [PATCH -next v2] futex: Remove requirement for lock_page in get_futex_key

From: Peter Zijlstra
Date: Mon Jan 11 2016 - 12:47:38 EST


On Thu, Jan 07, 2016 at 03:51:14PM -0800, Davidlohr Bueso wrote:
> +++ b/kernel/futex.c
> @@ -520,7 +520,20 @@ again:
> else
> err = 0;
>
> - lock_page(page);
> + /*
> + * The treatment of mapping from this point on is critical. The page
> + * lock protects many things but in this context the page lock
> + * stabilises mapping, prevents inode freeing in the shared
> + * file-backed region case and guards against movement to swap cache.
> + *
> + * Strictly speaking the page lock is not needed in all cases being
> + * considered here and page lock forces unnecessarily serialisation.
> + * From this point on, mapping will be reverified if necessary and
> + * page lock will be acquired only if it is unavoiable.

Making up new words eh?

> + */
> + page = compound_head(page);
> + mapping = READ_ONCE(page->mapping);
> +
> /*
> * If page->mapping is NULL, then it cannot be a PageAnon
> * page; but it might be the ZERO_PAGE or in the gate area or


> } else {
> + struct inode *inode;
> +
> + /*
> + * The associtated futex object in this case is the inode and
> + * the page->mapping must be traversed. Ordinarily this should
> + * be stabilised under page lock but it's not strictly
> + * necessary in this case as we just want to pin the inode, not
> + * update radix tree or anything like that.
> + *
> + * The RCU read lock is taken as the inode is finally freed
> + * under RCU. If the mapping still matches expectations then the
> + * mapping->host can be safely accessed as being a valid inode.
> + */
> + rcu_read_lock();
> + if (READ_ONCE(page->mapping) != mapping ||
> + !mapping->host) {
> + rcu_read_unlock();
> + put_page(page);
> +
> + goto again;
> + }
> + inode = mapping->host;

Don't you want a single load of mapping->host? Could it not have become
NULL just between these two loads?

> +
> + /*
> + * Take a reference unless it is about to be freed. Previously
> + * this reference was taken by ihold under the page lock
> + * pinning the inode in place so i_lock was unnecessary. The
> + * only way for this check to fail is if the inode was
> + * truncated in parallel so warn for now if this happens.
> + *
> + * We are not calling into get_futex_key_refs() in file-backed
> + * cases, therefore a successful atomic_inc return below will
> + * guarantee that get_futex_key() will continue to imply MB (B).
> + */
> + if (WARN_ON(!atomic_inc_not_zero(&inode->i_count))) {

ONCE?

> + rcu_read_unlock();
> + put_page(page);
> +
> + goto again;
> + }
> +
> + /* Should be impossible but lets be paranoid for now */
> + BUG_ON(inode->i_mapping != mapping);
> +
> key->both.offset |= FUT_OFF_INODE; /* inode-based key */
> - key->shared.inode = mapping->host;
> + key->shared.inode = inode;
> key->shared.pgoff = basepage_index(page);
> + rcu_read_unlock();
> }