Re: [RFC][PATCH RT] rwsem_rt: Another (more sane) approach to mulitreader rt locks

From: Peter Zijlstra
Date: Thu May 17 2012 - 16:20:56 EST


On Thu, 2012-05-17 at 13:08 -0700, Paul E. McKenney wrote:
> I don't claim to understand all of the code, but I am also unafraid to
> ask stupid questions. ;-)
>
> So, is it possible to do something like the following?
>
> 1. Schedule a workqueue from an RCU callback, and to have that
> workqueue do the fput.

Possible yes, but also undesirable, fput() can do a lot of work. Viro
very much didn't want this.

> 2. Make things like unmount() do rcu_barrier() followed by
> flush_workqueue(), or probably multiple flush_workqueue()s.

For unmount() we could get away with this, unmount() isn't usually
(ever?) a critical path. However, as noted by viro the fput() which is
still required can itself cause a tremendous amount of work, even if
only synced against an unmount, having this work done from an async
context isn't desired.

> 3. If someone concurrently does munmap() and a write to the
> to-be-unmapped region, then the write can legally happen.

Not entirely different from the current situation -- the timing changes
between the RCU and current implementation, but imagine the write
happens while the unmap() is in progress but hasn't quite reached the
range we write to.

Anyway, this is all firmly in 'undefined' territory so anybody breaking
from this change deserves all the pain (and probably more) they get.

As already stated, any fault in a region that's being unmapped is the
result of an ill-formed program.

> 4. Acquire mmap_sem in the fault path, but only if the fault
> requires blocking, and recheck the situation under
> mmap_sem -- the hope being to prevent long-lived page
> faults from messing things up.

Not relevant, a fault might not need to block but could still extend the
refcount lifetime of the file object beyond unmap and thus bear the
responsibility of the final fput, which we cannot know a-priori.

Its all made much more complex by the fact that we're avoiding taking
the refcount from the speculative fault in order to avoid the 'global'
synchronization on that cacheline -- which is the real problem
really :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/