Re: write-behind on streaming writes

From: Vivek Goyal
Date: Tue Jun 05 2012 - 14:48:56 EST


On Tue, Jun 05, 2012 at 01:41:57PM -0400, Vivek Goyal wrote:
> On Tue, Jun 05, 2012 at 01:23:02PM -0400, Vivek Goyal wrote:
> > On Wed, May 30, 2012 at 11:21:29AM +0800, Fengguang Wu wrote:
> >
> > [..]
> > > (2) comes from the use of _WAIT_ flags in
> > >
> > > sync_file_range(..., SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER);
> > >
> > > Each sync_file_range() syscall will submit 8MB write IO and wait for
> > > completion. That means the async write IO queue constantly swing
> > > between 0 and 8MB fillness at the frequency (100MBps / 8MB = 12.5ms).
> > > So on every 12.5ms, the async IO queue runs empty, which gives any
> > > pending read IO (from firefox etc.) a chance to be serviced. Nice
> > > and sweet breaks!
> >
> > I doubt that async IO queue is empty for 12.5ms. We wait for previous
> > range to finish (index-1) and have already started the IO on next 8MB
> > of pages. So effectively that should keep 8MB of async IO in
> > queue (until and unless there are delays from user space side). So reason
> > for latency improvement might be something else and not because async
> > IO queue is empty for some time.
>
> With sync_file_range() test, we can have 8MB of IO in flight. Without that
> I think we can have more at times and that might be the reason for latency
> improvement.
>
> I see that CFQ has code to allow deeper NCQ depth if there is only a single
> writer. So once a reader comes along it might find tons of async IO
> already in flight. sync_file_range() will limit that in flight IO hence
> the latency improvement. So if we have multiple dd doing sync_file_range()
> then probably this latency improvement should go away.
>
> I will run some tests to verify if my understanding about deeper queue
> depths in case of single writer is correct or not.

So I did run some tests and can confirm that on an average there seem to
be more in flight requests *without* sync_file_range() and that's probably
the reason that why sync_file_range() test is showing better latency.

I can see that with "dd if=/dev/zero of=zerofile bs=1M count=1024", we are
driving deeper queue depths (upto 32) and in later stages in flight
requests are constantly high.

With sync_file_range(), in flight requests number of requests fluctuate a
lot between 1 and 32. Many a times it is just 1 or up to 16 and few times
went up to 32.

So sync_file_range() test keeps less in flight requests on on average
hence better latencies. It might not produce throughput drop on SATA
disks but might have some effect on storage array luns. Will give it
a try.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/