Re: [PATCH v2] writeback: avoid race when update bandwidth

From: Fengguang Wu
Date: Thu Jun 14 2012 - 10:00:14 EST


On Thu, Jun 14, 2012 at 12:05:59PM +1000, Dave Chinner wrote:
> On Wed, Jun 13, 2012 at 08:14:34PM +0800, Fengguang Wu wrote:
> > On Wed, Jun 13, 2012 at 01:59:20PM +1000, Dave Chinner wrote:
> > > On Tue, Jun 12, 2012 at 07:52:19PM +0800, Fengguang Wu wrote:
> > > > On Tue, Jun 12, 2012 at 07:46:01PM +0800, Wanpeng Li wrote:
> > > > > From: Wanpeng Li <liwp@xxxxxxxxxxxxxxxxxx>
> > > > >
> > > > > "V1 -> V2"
> > > > > * remove dirty_lock
> > > > >
> > > > > Since bdi->wb.list_lock is used to protect the b_* lists,
> > > > > so the flushers who call wb_writeback to writeback pages will
> > > > > stuck when bandwidth update policy holds this lock. In order
> > > > > to avoid this race we can introduce a new bandwidth_lock who
> > > > > is responsible for protecting bandwidth update policy.
> > > > >
> > > > > Signed-off-by: Wanpeng Li <liwp.linux@xxxxxxxxx>
> > > >
> > > > Applied with a new title "writeback: use a standalone lock for
> > > > updating write bandwidth". "race" is sensitive because it often
> > > > refers to some locking error.
> > >
> > > Fengguang - can we get some evidence that this is a contended lock
> > > before changing the scope of it? All of the previous "breaking up
> > > global locks" have been done based on lock contention data, so
> > > moving back to a global lock for this needs to have the same
> > > analysis provided...
> >
> > Good point. Attached is the lockstat for the case "10 disks each runs
> > 100 dd dirtier tasks":
> >
> > lkp-ne02/JBOD-10HDD-thresh=4G/xfs-100dd-1-3.2.0-rc5
>
> (nothing attached)
>
> > The wb->list_lock contention is much better than I expected, which is
> > good. What stand out are
> > waittime-total
> > - &rq->lock by double_rq_lock() 6738952.13
> > - clockevents_lock by clockevents_notify() 2155554.37
> > - mapping->tree_lock by test_clear_page_writeback() 931550.13
> > - sb_lock by grab_super_passive() 918815.87
> > - &zone->lru_lock by pagevec_lru_move_fn() 912681.05
> >
> > - sysfs_mutex by sysfs_permission() 24029975.20 # mutex
> > - ip->i_lock by xfs_ilock() 18428284.10 # mrlock
>
> The wait time is not really an indication of contention problems.
> Large wait time is usually an indication that the lock is being used
> a lot.

Right.

> What matters is the number of contentions vs the number of
> acquisitions, and the number of those contentions that bounced the
> lock. If the number of contentions is >= 0.5% of the acquisitions,
> then the lock can be considered hot and needing some work. If I look
> here:

I wonder if anyone has a simple script for sorting lock_stat output
based on that (and perhaps other selectable) criterion? It should be
possible to write on myself, but still.. ;-)

Default lock_stat output is sorted by absolute number of contentions.

> http://lists.linux.hp.com/~enw/ext4/3.2/3.2-full-lockstats.2/ffsb_fsscale.xfs.large_file_creates_threads=192/profiling/iteration.1/lock_stat
>
> Which is a 192 thread concurrent write on a 48-core machine, the
> wb.list_lock shows 5,532 acquistions for the entire test, while the
> mapping tree lock took 440 million!. So your test isn't really one
> that shows wb.list_lock contention. The 192-thread mailserver
> workload from the same machine:
>
> http://lists.linux.hp.com/~enw/ext4/3.2/3.2-full-lockstats.2/ffsb_fsscale.xfs.mail_server_threads=192/profiling/iteration.1/lock_stat
>
> Shows about 7.1m acquisitions of the wb.list_lock, but only 28,000
> contentions. So it isn't really contended enough to justify
> replacing it with a global lock.

Right.

> FWIW, the third most contended lock on that workload is the XFS
> delayed write queue lock - 25M acquisitions for 600k contentions - a
> rate of about 2% which means quite severe contention. That lock no
> longer exists in 3.5 - Christoph completely reworked the delayed
> write buffer support to remove the global list and lock because it
> was showing up in profiles like this...
>
> Indeed, that profile shows that XFS owns 7 of the 10 most contended
> locks, and 3 of them have had significant work done to reduce the
> contention since 3.2 as a result of recent profile results like this.

Nice work!

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/