Re: [RFC][PATCH] fs: optimize inotify/fsnotify code for unwatched files

From: Paul E. McKenney
Date: Sat Jun 20 2015 - 21:31:28 EST


On Sat, Jun 20, 2015 at 11:02:08AM -0700, Dave Hansen wrote:
> On 06/19/2015 07:21 PM, Paul E. McKenney wrote:
> >>> > > What is so expensive in it? Just the memory barrier in it?
> >> >
> >> > The profiling doesn't hit on the mfence directly, but I assume that the
> >> > overhead is coming from there. The "mov 0x8(%rdi),%rcx" is identical
> >> > before and after the barrier, but it appears much more expensive
> >> > _after_. That makes no sense unless the barrier is the thing causing it.
> > OK, one thing to try is to simply delete the memory barrier. The
> > resulting code will be unsafe, but will probably run well enough to
> > get benchmark results. If it is the memory barrier, you should of
> > course get increased throughput.
>
> So I took the smp_mb() out of __srcu_read_lock(). The benchmark didn't
> improve at all. Looking at the profile, all of the overhead had just
> shifted to __srcu_read_unlock() and its memory barrier! Removing the
> barrier in __srcu_read_unlock() got essentially the same gains out of
> the benchmark as the original patch in this thread that just avoids RCU.
>
> I think that's fairly conclusive that the source of the overhead is,
> indeed, the memory barriers.
>
> Although I said this test was single threaded, I also had another
> thought. The benchmark is single-threaded, but 'perf' is sitting doing
> profiling and who knows what else on the other core, and the profiling
> NMIs are certainly writing plenty of data to memory. So, there might be
> plenty of work for that smp_mb()/mfence to do _despite_ the benchmark
> itself being single threaded.

Well, it is not hard to have an SRCU-like thing that doesn't have
read-side memory barriers, given that older versions of SRCU didn't
have them. However, the price is increased latency for the analog to
synchronize_srcu(). I am guessing that this would not be a problem
for notification-group destruction, which is presumably rare.

That said, if empty *_fsnotify_mask is the common case or if the
overhead of processing notification overwhelms srcu_read_lock(),
your original patch seems a bit simpler.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/