Re: [PATCHSET v3][RFC] Make background writeback not suck

From: Jens Axboe
Date: Thu Mar 31 2016 - 23:25:55 EST


On 03/31/2016 06:46 PM, Dave Chinner wrote:
On Thu, Mar 31, 2016 at 08:29:35AM -0600, Jens Axboe wrote:
On 03/31/2016 02:24 AM, Dave Chinner wrote:
On Wed, Mar 30, 2016 at 09:07:48AM -0600, Jens Axboe wrote:
Hi,

This patchset isn't as much a final solution, as it's demonstration
of what I believe is a huge issue. Since the dawn of time, our
background buffered writeback has sucked. When we do background
buffered writeback, it should have little impact on foreground
activity. That's the definition of background activity... But for as
long as I can remember, heavy buffered writers has not behaved like
that. For instance, if I do something like this:

$ dd if=/dev/zero of=foo bs=1M count=10k

on my laptop, and then try and start chrome, it basically won't start
before the buffered writeback is done. Or, for server oriented
workloads, where installation of a big RPM (or similar) adversely
impacts data base reads or sync writes. When that happens, I get people
yelling at me.

Last time I posted this, I used flash storage as the example. But
this works equally well on rotating storage. Let's run a test case
that writes a lot. This test writes 50 files, each 100M, on XFS on
a regular hard drive. While this happens, we attempt to read
another file with fio.

Writers:

$ time (./write-files ; sync)
real 1m6.304s
user 0m0.020s
sys 0m12.210s

Great. So a basic IO tests looks good - let's through something more
complex at it. Say, a benchmark I've been using for years to stress
the Io subsystem, the filesystem and memory reclaim all at the same
time: a concurent fsmark inode creation test.
(first google hit https://lkml.org/lkml/2013/9/10/46)

Is that how you are invoking it as well same arguments?

Yes. And the VM is exactly the same, too - 16p/16GB RAM. Cut down
version of the script I use:

#!/bin/bash

QUOTA=
MKFSOPTS=
NFILES=100000
DEV=/dev/vdc
LOGBSIZE=256k
FSMARK=/home/dave/src/fs_mark-3.3/fs_mark
MNT=/mnt/scratch

while [ $# -gt 0 ]; do
case "$1" in
-q) QUOTA="uquota,gquota,pquota" ;;
-N) NFILES=$2 ; shift ;;
-d) DEV=$2 ; shift ;;
-l) LOGBSIZE=$2; shift ;;
--) shift ; break ;;
esac
shift
done
MKFSOPTS="$MKFSOPTS $*"

echo QUOTA=$QUOTA
echo MKFSOPTS=$MKFSOPTS
echo DEV=$DEV

sudo umount $MNT > /dev/null 2>&1
sudo mkfs.xfs -f $MKFSOPTS $DEV
sudo mount -o nobarrier,logbsize=$LOGBSIZE,$QUOTA $DEV $MNT
sudo chmod 777 $MNT
sudo sh -c "echo 1 > /proc/sys/fs/xfs/stats_clear"
time $FSMARK -D 10000 -S0 -n $NFILES -s 0 -L 32 \
-d $MNT/0 -d $MNT/1 \
-d $MNT/2 -d $MNT/3 \
-d $MNT/4 -d $MNT/5 \
-d $MNT/6 -d $MNT/7 \
-d $MNT/8 -d $MNT/9 \
-d $MNT/10 -d $MNT/11 \
-d $MNT/12 -d $MNT/13 \
-d $MNT/14 -d $MNT/15 \
| tee >(stats --trim-outliers | tail -1 1>&2)
sync
sudo umount /mnt/scratch

Perfect, thanks!

The above was run without scsi-mq, and with using the deadline scheduler,
results with CFQ are similary depressing for this test. So IO scheduling
is in place for this test, it's not pure blk-mq without scheduling.

virtio in guest, XFS direct IO -> no-op -> scsi in host.

That has write back caching enabled on the guest, correct?

No. It uses virtio,cache=none (that's the "XFS Direct IO" bit above).
Sorry for not being clear about that.

That's fine, it's one less worry if that's not the case. So if you cat the 'write_cache' file in the virtioblk sysfs block queue/ directory, it says 'write through'? Just want to confirm that we got that propagated correctly.


--
Jens Axboe