Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug()

From: Dave Chinner
Date: Thu Sep 17 2015 - 19:04:14 EST


On Thu, Sep 17, 2015 at 12:14:53PM +1000, Dave Chinner wrote:
> On Wed, Sep 16, 2015 at 06:12:29PM -0700, Linus Torvalds wrote:
> > On Wed, Sep 16, 2015 at 5:37 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > >
> > > TL;DR: Results look really bad - not only is the plugging
> > > problematic, baseline writeback performance has regressed
> > > significantly.
> >
> > Dave, if you're testing my current -git, the other performance issue
> > might still be the spinlock thing.
>
> I have the fix as the first commit in my local tree - it'll remain
> there until I get a conflict after an update. :)
>
> > The plugging IO pauses are interesting, though. Plugging really
> > *shouldn't* cause that kind of pauses, _regardless_ of what level it
> > happens on, so I wonder if the patch ends up just exposing some really
> > basic problem that just normally goes hidden.
>
> Right, that's what I suspect - it didn't happen on older kernels,
> but we've just completely reworked the writeback code for the
> control group awareness since I last looked really closely at
> this...
>
> > Can you match up the IO wait times with just *where* it is
> > waiting? Is it waiting for that inode I_SYNC thing in
> > inode_sleep_on_writeback()?
>
> I'll do some more investigation.

Ok, I'm happy to report there is actually nothing wrong with the
plugging code that is your tree. I finally tracked the problem I
was seeing down to a misbehaving RAID controller.[*]

With that problem sorted:

kernel files/s wall time
3.17 32500 5m54s
4.3-noplug 34400 5m25s
3.17-plug 52900 3m19s
4.3-badplug 60540 3m24s
4.3-rc1 56600 3m23s

So the 3.17/4.3-noplug baselines so no regression - 4.3 is slightly
faster. All the plugging variants show roughly the same improvement
and IO behaviour. These numbers are reproducable and there are no
weird performance inconsistencies during any of the 4.3-rc1 kernel
runs. Hence my numbers and observed behaviour now aligns with
Chris' results and so I think we can say the reworked high level
plugging is behaving as we expected it to.

Cheers,

Dave.

[*] It seems to have a dodgy battery connector, and so has been
"losing" battery backup and changing the cache mode of the HBA from
write back to write through. This results in changing from NVRAM
performance to SSD native performance and back again. A small
vibration would cause the connection to the battery to reconnect and
the controller would switch back to writeback mode. The few log
entries in the bios showed changes in status between a few seconds
apart to minutes apart - enough for the cache status to change
several times a 5-10 minute benchmark run.

I didn't notice the hardware was playing up because it wasn't
triggering the machine alert indicator through the bios like it's
supposed to and so the visible and audible alarms were not being
triggered, nor was the BMC logging the raid controller cache status
changes.

In the end, I noticed it by chance - during a low level test the
behaviour changed very obviously as one of my dogs ran past the
rack. I unplugged everything inside the server, plugged it all back
in, powered it back up and fiddled with cables until I found what
was causing the problem. And having done this, the BMC is now
sending warnings and the audible alarm is working when the battery
is disconnected... :/
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/