Re: [PATCH RFC fs] v2 Make sync() satisfy many requests with oneinvocation

From: Paul E. McKenney
Date: Sat Jul 27 2013 - 00:06:03 EST

On Sat, Jul 27, 2013 at 12:57:03PM +1000, Dave Chinner wrote:
> On Fri, Jul 26, 2013 at 04:28:52PM -0700, Paul E. McKenney wrote:
> > Dave Jones reported RCU stalls, overly long hrtimer interrupts, and
> > amazingly long NMI handlers from a trinity-induced workload involving
> > lots of concurrent sync() calls (
> > There are any number of things that one might do to make sync() behave
> > better under high levels of contention, but it is also the case that
> > multiple concurrent sync() system calls can be satisfied by a single
> > sys_sync() invocation.
> >
> > Given that this situation is reminiscent of rcu_barrier(), this commit
> > applies the rcu_barrier() approach to sys_sync(). This approach uses
> > a global mutex and a sequence counter. The mutex is held across the
> > sync() operation, which eliminates contention between concurrent sync()
> > operations.
> >
> > The counter is incremented at the beginning and end of
> > each sync() operation, so that it is odd while a sync() operation is in
> > progress and even otherwise, just like sequence locks.
> >
> > The code that used to be in sys_sync() is now in do_sync(), and sys_sync()
> > now handles the concurrency. The sys_sync() function first takes a
> > snapshot of the counter, then acquires the mutex, and then takes another
> > snapshot of the counter. If the values of the two snapshots indicate that
> > a full do_sync() executed during the mutex acquisition, the sys_sync()
> > function releases the mutex and returns ("Our work is done!"). Otherwise,
> > sys_sync() increments the counter, invokes do_sync(), and increments
> > the counter again.
> >
> > This approach allows a single call to do_sync() to satisfy an arbitrarily
> > large number of sync() system calls, which should eliminate issues due
> > to large numbers of concurrent invocations of the sync() system call.
> This is not addressing the problem that is causing issues during
> sync. Indeed, it only puts a bandaid over the currently observed
> trigger.
> Indeed, i suspect that this will significantly slow down concurrent
> sync operations, as it serialised sync across all superblocks rather
> than serialising per-superblock like is currently done. Indeed, that
> per-superblock serialisation is where all the lock contention
> problems are. And it's not sync alone that causes the contention
> problems - it has to be combined with other concurrent workloads
> that add or remove inodes from the inode cache at tha same time.

Seems like something along the lines of wakeup_flusher_threads()
currently at the start of sys_sync() would address this.

> I have patches to address that by removing the source
> of the lock contention completely, and not just for the sys_sync
> trigger. Those patches make the problems with concurrent
> sys_sync operation go away completely for me, not to mention improve
> performance for 8+ thread metadata workloads on XFS significantly.
> IOWs, I don't see that concurrent sys_sync operation is a problem at
> all, and it is actively desirable for systems that have multiple
> busy filesystems as it allows concurrent dispatch of IO across those
> multiple filesystems. Serialising all sys_sync work might stop the
> contention problems, but it will also slow down concurrent sync
> operations on busy systems as it only allows one thread to dispatch
> and wait for IO at a time.
> So, let's not slap a bandaid over a symptom - let's address the
> cause of the lock contention properly....


Could you please send your patches over to Dave Jones right now? I am
getting quite tired of getting RCU CPU stall warning complaints from
him that turn out to be due to highly contended sync() system calls.

Thanx, Paul

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at