Re: [PATCH, 3.7-rc7, RESEND] fs: revert commit bbdd6808 to fallocateUAPI

From: Dave Chinner
Date: Fri Dec 07 2012 - 19:39:47 EST


On Fri, Dec 07, 2012 at 05:02:32PM -0500, Ric Wheeler wrote:
> On 12/07/2012 04:57 PM, Theodore Ts'o wrote:
> >On Fri, Dec 07, 2012 at 04:42:06PM -0500, Ric Wheeler wrote:
> >>The other things that I think we should try would be to convert over
> >>larger chunks as we discussed on the list back in the summer (just
> >>because the user writes 4KB does not mean that we cannot flip over
> >>1MB and zero that).
> >Writing a megabyte is not free. If you assume that your HDD has a
> >sustained write throughput of 100-125 MB/s, writing a megabyte will
> >take 8-10ms. It might be a win if you amortize it over a large number
> >of writes, but it doesn't help your 99.9 percentile latency numbers.
> >(99.9 percentile latency numbers matters because eventually you'll
> >have a user request which hits multiple serial long latency
> >operations, and then the delay looks **really** user visible.)
> >
> > - Ted
>
> Writing 4KB at a time to a disk cost XX units of time.
>
> Writing to the same sector (especially for a HDD), cost XX units + a small amount.
>
> I suggest that we try it out.
>
> For SSD's, much better to use specific HW offload commands if
> possible like WRITE_SAME (zeroed) or UNMAP/TRIM to get that
> performance boost since no actual data is moved...

Yup, that could be done quite trivially in XFS. Just mark the
preallocated extents as "busy" rather than unwritten, mark the
transaction as synchronous and the transaction commit will issue a
discard on the preallocated ranges before returning to userspace.
The extra overhead to the preallocation command is unlikely to be
noticed, and unwritten extent conversion overhead just goes away...

No fallocate() API changes necessary, though I think it would be
better if the user application gave a hint that it preferred "writing
zeros" (i.e. FALLOC_FL_WRITE_ZEROS) to allocating unwritten extents
as there are workloads where one will always be clearly better than
the other...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/