Re: Triggering non-integrity writeback from userspace

From: Andres Freund
Date: Thu Oct 29 2015 - 12:24:04 EST

On 2015-10-29 12:54:22 +1100, Dave Chinner wrote:
> On Thu, Oct 29, 2015 at 12:23:12AM +0100, Andres Freund wrote:
> > The blocking/latency of the fsync doesn't actually matter at all *for
> > this callsite*. It's called from a dedicated background process - if
> > it's slowed down by a couple seconds it doesn't matter much.
> > The problem is that if you have a couple gigabytes of dirty data being
> > fsync()ed at once, latency for concurrent reads and writes often goes
> > absolutely apeshit. And those concurrent reads and writes might
> > actually be latency sensitive.
> Right, but my point is with an async fsync/fdatasync you don't need
> this background process - you can just trickle out async fdatasync
> calls instead of trckling out calls to sync_file_range().

We don't want to do the checkpointing from normal backends that process
user queries, so there has to be a background process anyway. Depending
on settings we only do the checkpoints in 5 to 60 minutes intervals
(spread over that interval).

> > By calling sync_file_range() over small ranges of pages shortly after
> > they've been written we make it unlikely (but still possible) that much
> > data has to be flushed at fsync() time.
> Right, but you still need the fsync call, whereas with a async fsync
> call you don't - when you gather the completion, no further action
> needs to be taken on that dirty range.

I assume that the actual IOs issued by the async fsync and a plain fsync
would be pretty similar. So the problem that an fsync of large amounts
of dirty data causes latency increases for other issuers of IO wouldn't
be gone, no?

> > At the moment using fdatasync() instead of fsync() is a considerable
> > performance advantage... If I understand the above proposal correctly,
> > it'd allow specifying ranges, is that right?
> Well, the patch I sent doesn't do ranges, but it could easily be
> passed in as the iocb has offset/len parameters that are used by

That'd be cool. Then we could issue those for asynchronous transaction
commits, and to have more wal writes concurrently in progress by the
background wal writer.

I'll try the patch from 20151028232641.GS8773@dastard and see wether I
can make it be advantageous for throughput (for WAL flushing, not the
checkpointer process). Wish I had a better storage system, my guess
it'll be more advantageous there. We'll see.


Andres Freund
