Re: [PATCHSET v3][RFC] Make background writeback not suck

From: Jens Axboe
Date: Thu Mar 31 2016 - 23:29:46 EST

On 03/31/2016 06:56 PM, Dave Chinner wrote:
On Thu, Mar 31, 2016 at 10:21:04AM -0600, Jens Axboe wrote:
On 03/31/2016 08:29 AM, Jens Axboe wrote:
What I see in these performance dips is the XFS transaction
subsystem stalling *completely* - instead of running at a steady
state of around 350,000 transactions/s, there are *zero*
transactions running for periods of up to ten seconds. This
co-incides with the CPU usage falling to almost zero as well.
AFAICT, the only thing that is running when the filesystem stalls
like this is memory reclaim.

I'll take a look at this, stalls should definitely not be occurring. How
much memory does the box have?

I can't seem to reproduce this at all. On an nvme device, I get a
fairly steady 60K/sec file creation rate, and we're nowhere near
being IO bound. So the throttling has no effect at all.

That's too slow to show the stalls - your likely concurrency bound
in allocation by the default AG count (4) from mkfs. Use mkfs.xfs -d
agcount=32 so that every thread works in it's own AG.

That's the key, with that I get 300-400K ops/sec instead. I'll run some testing with this tomorrow and see what I can find, it did one full run now and I didn't see any issues, but I need to run it at various settings and see if I can find the issue.

On a raid0 on 4 flash devices, I get something that looks more IO
bound, for some reason. Still no impact of the throttling, however.
But given that your setup is this:

virtio in guest, XFS direct IO -> no-op -> scsi in host.

we do potentially have two throttling points, which we don't want.
Is both the guest and the host running the new code, or just the

Just the guest. Host is running a 4.2.x kernel, IIRC.


In any case, can I talk you into trying with two patches on top of
the current code? It's the two newest patches here:

The first treats REQ_META|REQ_PRIO like they should be treated, like
high priority IO. The second disables throttling for virtual
devices, so we only throttle on the backend. The latter should
probably be the other way around, but we need some way of conveying
that information to the backend.

I'm not changing the host kernels - it's a production machine and so
it runs long uptime testing of stable kernels. (e.g. catch slow
memory leaks, etc). So if you've disabled throttling in the guest, I
can't test the throttling changes.

Right, that'd definitely hide the problem for you. I'll see if I can get it in a reproducible state and take it from there.

On your host, you said it's SCSI backed, but what does the device look like?

Jens Axboe