Re: [PATCH v11 00/21] Add support for NV-DIMMs to ext4

From: Andreas Dilger
Date: Tue Sep 30 2014 - 17:25:26 EST


On Sep 30, 2014, at 2:37 PM, Valdis.Kletnieks@xxxxxx wrote:
> On Tue, 30 Sep 2014 12:08:41 -0400, Matthew Wilcox said:
>
>> The more I think about this, the more I think this is a bad idea.
>> When you have a file open with O_DIRECT, your I/O has to be done in
>> 512-byte multiples, and it has to be aligned to 512-byte boundaries
>> in memory. If an unsuspecting application has O_DIRECT forced on it,
>> it isn't going to know to do that, and so all its I/Os will fail.
>
> I'm thinking of more than one place where that would be a feature, not a bug. :)

We prototyped a feature like this for Lustre - so the admins could
turn IO into O_DIRECT, because the HPC compute nodes have relatively
small RAM per core and don't want to have file data cache consuming
RAM that the compute jobs need.

Unfortunately, the O_DIRECT semantics are a killer for poorly written
applications that end up doing small synchronous writes. We didn't
have any IO size problems, because Lustre client have to copy the data
to the servers anyway, so arbitrary IO sizes are fine.

While this _might_ be OK for NVRAM mapped directly into the filesystem,
even for local disk based storage with 512-byte writes at 100 IOPS is
only 50KB/s instead of ~100MB/s for a cached writes to a single disk.

I think you would be much better off having more aggressive "use once"
semantics in the page cache, so that page cache pages for streaming
writes are evicted more aggressively from cache rather than going down
the "automatic O_DIRECT" hole.

Cheers, Andreas

>> What problem are you really trying to solve? Some big files hogging
>> the page cache?
>
> I'm officially a storage admin. I mostly support HPC and research. As
> such, I'm always looking to add tools to my toolkit. :)
>
> (And yes, I fully recognize that *in general*, this is a Bad Idea. However,
> when you've got That One Problem Data File that *should* always be access
> via O_DIRECT, and *usually* is accessed via O_DIRECT, and bad things happen
> if something accesses it without it (for instance, when the file is 1.5X the
> actual RAM), you start looking for fixes. If you've got another, more
> sustainable way to say "do not let file /X/Y/Z hog the page cache" (and
> no, LD_PRELOAD isn't sustainable the way chattr is, in my book), feel free to
> recommend something. :)


Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail