Re: [PATCH, 3.7-rc7, RESEND] fs: revert commit bbdd6808 to fallocateUAPI

From: Chris Mason
Date: Fri Dec 07 2012 - 16:57:24 EST

On Fri, Dec 07, 2012 at 02:49:04PM -0700, Ric Wheeler wrote:
> On 12/07/2012 04:43 PM, Chris Mason wrote:
> > On Fri, Dec 07, 2012 at 02:27:43PM -0700, Theodore Ts'o wrote:
> >> On Fri, Dec 07, 2012 at 04:09:32PM -0500, Chris Mason wrote:
> >>> Persistent trim is what I had in mind, but there are other ideas that do
> >>> imply a change in behavior as well. Can we safely assume this feature
> >>> won't matter on spinning media? New features like persistent
> >>> trim do make it much easier to solve securely, and using a bit for it
> >>> means we can toss back an error to the app if the underlying storage
> >>> isn't safe.
> >> We originally implemented no hide stale for spinning media. Some
> >> folks have claimed that for XFS their superior technology means that
> >> no hide stale doesn't buy them anything for HDD's. I'm not entirely
> >> sure I buy this, since if you need to update metadata, it means at
> >> least one extra seek for each random write into 4k preallocated space,
> >> and 7200 RPM disks only have about 200 seeks per second.
> > True, 7200 RPM disks are slow, but even allowing them to expose stale
> > data just makes them a little less slow.
> >
> > I know it's against the rules to pretend that disks don't matter. But
> > really, once you're doing random IO into a spindle you've given up on
> > performance anyway.
> >
> > -chris
> That's right.
> And equally true, once you have moved the disk heads to that track, you can
> write a lot as cheaply as a little (i.e., do 1MB instead of 4KB). That will also
> avoid fragmentation of the extents.

When you do a 4K write, you have to remember that you've written just
those 4K. When you do a 1MB write, you have to remember that you've
written just that 1MB. It's the same operation, except with the 1MB
you've also had to setup all the bios and send down the zeros, and do
the proper locking to make sure you're not sending zeros down over
some concurrent IO.

The 1MB setup is actually more work, but it does greatly reduce the
amount of time the workload needs to run before it goes into a steady
state. For smaller files it may work well, but for larger ones I don't
think it will be enough.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at