Re: [PATCH RFC] introduce ioctl to completely invalidate page cache

From: Trond Myklebust
Date: Tue Oct 07 2014 - 15:35:53 EST


On Tue, Oct 7, 2014 at 3:16 PM, Jan Kara <jack@xxxxxxx> wrote:
> On Tue 07-10-14 12:30:59, Dave Chinner wrote:
>> On Mon, Oct 06, 2014 at 04:30:19PM +0200, Jan Kara wrote:
>> > On Mon 06-10-14 11:33:23, Thanos Makatos wrote:
>> > > > > Trond also had a comment that if we extended the ioctl to work for all
>> > > > > inodes (not just blkdev) and allowed some additional flags of what
>> > > > > needs to be invalidated, the new ioctl would be also useful to NFS
>> > > > > userspace - see Trond's email at
>> > > > >
>> > > > > http://www.spinics.net/lists/linux-fsdevel/msg78917.html
>> > > > >
>> > > > > and the following thread. I would prefer to cover that usecase when we
>> > > > > are introducing new invalidation ioctl. Have you considered that Thanos?
>> > > >
>> > > > Sure, though I don't really know how to do it. I'll start by looking at the code
>> > > > flow when someone does " echo 3 > /proc/sys/vm/drop_caches", unless you
>> > > > already have a rough idea how to do that.
>> > >
>> > > I realise I haven't clearly understood what the semantics of this new ioctl
>> > > should be.
>> > >
>> > > My initial goal was to implement an ioctl that would _completely_ invalidate
>> > > the buffer cache of a block device when there is no file-system involved.
>> > > Unless I'm mistaken the patch I posted achieves this goal.
>> > Yes.
>> >
>> > > We now want to extend this patch to take care of cached metadata, which seems
>> > > to be of particular importance for NFS, and I suspect that this piece of
>> > > functionality will still be applicable to any kind of file-system, correct?
>> > So most notably they want the ioctl to work not only for block devices
>> > but also for any regular file. That's easily doable - you just call
>> > filemap_write_and_wait() and invalidate_inode_pages2() in the ioctl handler
>> > for regular files.
>> >
>> > Also they wanted to be able to specify a range of a mapping to invalidate -
>> > that's easily doable as well. Finally they wanted a 'flags' argument so you
>> > can additionally ask fs to invalidate also some metadata. How invalidation
>> > is done will be a fs specific thing and for now I guess we don't need to go
>> > into details. NFS guys can sort that out when they decide to implement it.
>> > So in the beginning we can just have u64 flags argument and in
>> > it a single 'INVAL_DATA' flag meaning that invalidation of data in a given
>> > range is requested. Later NFS guys can add further flags.
>>
>> Why do we need a new ioctl to do this? fadvise64() seems like it's
>> the exact fit for "FADV_INVALIDATE_[META]DATA" flags...
> Well, fadvise() is currently a hint to kernel. In this case we would
> really like the call to do the invalidation and return error if it fails
> for some reason. So I'm not sure fadvise() is a perfect fit. But I wouldn't
> be strongly opposed to it either.
>

fadvise is about giving programs the ability to "announce an intention
to access file data in a specific pattern in the future, thus allowing
the kernel to perform appropriate optimizations" according to the
manpage.

Cache invalidation and revalidation, OTOH, is about ensuring meta/data
consistency between the disk and inode/page cache.

I'm not seeing a perfect match. :-)

--
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@xxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/