Re: [RFC] A couple of questions about the paged I/O sub system

From: Ian Kent
Date: Wed Oct 21 2015 - 21:36:48 EST


On Wed, 2015-10-21 at 12:56 -0700, Hugh Dickins wrote:
> On Wed, 21 Oct 2015, Ian Kent wrote:

Thanks for taking the time to reply Hugh.

>
> > Hi all,
> >
> > I've been looking through some of the page reclaim code and at
> > truncate_inode_pages().
> >
> > I'm not familiar with the code and I'm struggling to understand it.
> >
> > One thing that is puzzling me right now is, if a file has pages
> that
> > have been modified and are swapped out when
> pagevec_lookup_entries() is
> > called will they be found?
>
> truncate_inode_pages() is a library function which a filesystem calls
> at some stage in its inode truncation processing, to take all the
> incore
> pages out of pagecache (out of its radix_tree), and free them up
> (usually: some might be otherwise pinned in memory at the time).
>
> A filesystem will have other work to do, very particular to that
> filesystem, to free up the actual disk blocks: that's definitely
> not part of truncate_inode_pages()'s job.
>
> It's also called when evicting an inode no longer needed in memory,
> to free the associated pagecache, when not deleting the blocks on
> disk.
>
> I think I don't understand your "swapped out": modifications occur to
> a page while it is in pagecache, and those modifications need to be
> written back to disk before that page can be reclaimed for other use.

Indeed, now I think about it, "swapped out" is a bad choice of words
when talking about a paged IO system.

What I'm trying to say is if pages allocated to a mapping are modified,
then under memory pressure, are they ever reclaimed by writing them to
swap storage or are they always reclaimed by writing them back to disk?

Now I think about what you've said here and looking at the code I
suspect the answer is they are always reclaimed by writing them to
disk.

>
> >
> > If not then how does truncate_inode_pages(_range)() handle waiting
> for
> > these pages to be swapped back in to perform the writeback and
> > truncation?
>
> Pages are never "swapped back in to perform the writeback":
> if writeback is needed, it's done before the page can be freed from
> pagecache; and if that data is needed again after the page was freed,
> it's read back in from disk to fresh page.

That makes sense, using swap would be unnecessary double handling.

>
> You may be worrying about what happens when a page is modified or
> under writeback when it is truncated: I think that's something each
> filesystem has to be careful of, and may deal with in different ways.

I'm wondering how a mapping nrpages can be non-zero (read greater than
one) after calling truncate_inode_pages().

But I'm looking at a much older kernel so it's quite different to
current upstream and this seemed like a question relevant to both
kernels to get some idea of how page reclaim works.

I guess what I'm really looking to work out is if it's possible, with
the current upstream kernel, for a mapping to have nrpages greater than
1 after calling truncate_inode_pages() and hopefully get some
explanation of why if that's not so.

It's certainly possible with the older kernel I'm looking at but I need
some info. before I consider looking for possible changes to back port.

>
> I'm not sure how much to read in to your use of the word "swap".
> It's true that shmem/tmpfs uses swap (of the swapon/swapoff variety)
> as backing for its pages when under pressure (and uses its own
> variant
> shmem_undo_range() to manage that, instead of
> truncate_inode_pages()),
> but most filesystems don't use "swap" at all.
>
> I just noticed your subject "paged I/O sub system": I hope you
> realize
> that mm/page_io.c is solely concerned with swap (of the
> swapon/swapoff
> variety), and has next to nothing to do with filesystems. (Just as,
> conversely, mm/swap.c has next to nothing to do with swap.)

LOL, right, I'm looking at the page reclaim code which, so far, hasn't
lead me to either of those source files.

>
> >
> > Anyone, please?
>
> I hope something I've said there has helped, but warn you that
> I'm a terrible person to engage in an extended conversation with!
> Expect long silences, pray for someone else to jump in.

As well as pointing out that swap storage shouldn't be used in this
case you've reminded me of the difference between swapping and demand
paging, so that's a good start.

Perhaps folks at linux-mm will have more to say.


> > Ian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/