Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

From: Chris Mason
Date: Tue Mar 15 2016 - 20:52:21 EST


On Tue, Mar 15, 2016 at 07:30:14PM -0500, Eric Sandeen wrote:
> On 3/15/16 7:06 PM, Linus Torvalds wrote:
> > On Tue, Mar 15, 2016 at 4:52 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >> >
> >> > It is pretty clear that the onus is on the patch submitter to
> >> > provide justification for inclusion, not for the reviewer/Maintainer
> >> > to have to prove that the solution is unworkable.
> > I agree, but quite frankly, performance is a good justification.
> >
> > So if Ted can give performance numbers, that's justification enough.
> > We've certainly taken changes with less.
>
> I've been away from ext4 for a while, so I'm really not on top of the
> mechanics of the underlying problem at the moment.
>
> But I would say that in addition to numbers showing that ext4 has trouble
> with unwritten extent conversion, we should have an explanation of
> why it can't be solved in a way that doesn't open up these concerns.
>
> XFS certainly has different mechanisms, but is the demonstrated workload
> problematic on XFS (or btrfs) as well? If not, can ext4 adopt any of the
> solutions that make the workload perform better on other filesystems?

When I've benchmarked this in the past, doing small random buffered writes
into an preallocated extent was dramatically (3x or more) slower on xfs
than doing them into a fully written extent. That was two years ago,
but I can redo it.

On a fio card, this gets 16,000 iops on a preallocated extent and 40,000
iops if you run it a second time. It's not random writes, but the fsync
probably means the preallocated conversion is more expensive. That's
on a 4.0 kernel, but I'll rerun it on nvme on newer kernels.

fio --name=fsync --rw=write --fsync=1 --bs=4k --filename=/xfs/fio_4096 --size=4g --overwrite=0

I'm happy to run variations on things, just let me know what workloads
are interesting.

-chris