Re: [RFC] A couple of questions about the paged I/O sub system

From: Hugh Dickins
Date: Thu Oct 22 2015 - 21:54:52 EST


On Thu, 22 Oct 2015, Ian Kent wrote:
> On Wed, 2015-10-21 at 12:56 -0700, Hugh Dickins wrote:
> > On Wed, 21 Oct 2015, Ian Kent wrote:
>
> Thanks for taking the time to reply Hugh.
>
> >
> > > Hi all,
> > >
> > > I've been looking through some of the page reclaim code and at
> > > truncate_inode_pages().
> > >
> > > I'm not familiar with the code and I'm struggling to understand it.
> > >
> > > One thing that is puzzling me right now is, if a file has pages
> > that
> > > have been modified and are swapped out when
> > pagevec_lookup_entries() is
> > > called will they be found?
> >
> > truncate_inode_pages() is a library function which a filesystem calls
> > at some stage in its inode truncation processing, to take all the
> > incore
> > pages out of pagecache (out of its radix_tree), and free them up
> > (usually: some might be otherwise pinned in memory at the time).
> >
> > A filesystem will have other work to do, very particular to that
> > filesystem, to free up the actual disk blocks: that's definitely
> > not part of truncate_inode_pages()'s job.
> >
> > It's also called when evicting an inode no longer needed in memory,
> > to free the associated pagecache, when not deleting the blocks on
> > disk.
> >
> > I think I don't understand your "swapped out": modifications occur to
> > a page while it is in pagecache, and those modifications need to be
> > written back to disk before that page can be reclaimed for other use.
>
> Indeed, now I think about it, "swapped out" is a bad choice of words
> when talking about a paged IO system.
>
> What I'm trying to say is if pages allocated to a mapping are modified,
> then under memory pressure, are they ever reclaimed by writing them to
> swap storage or are they always reclaimed by writing them back to disk?
>
> Now I think about what you've said here and looking at the code I
> suspect the answer is they are always reclaimed by writing them to
> disk.

Yes.

>
> >
> > >
> > > If not then how does truncate_inode_pages(_range)() handle waiting
> > for
> > > these pages to be swapped back in to perform the writeback and
> > > truncation?
> >
> > Pages are never "swapped back in to perform the writeback":
> > if writeback is needed, it's done before the page can be freed from
> > pagecache; and if that data is needed again after the page was freed,
> > it's read back in from disk to fresh page.
>
> That makes sense, using swap would be unnecessary double handling.
>
> >
> > You may be worrying about what happens when a page is modified or
> > under writeback when it is truncated: I think that's something each
> > filesystem has to be careful of, and may deal with in different ways.
>
> I'm wondering how a mapping nrpages can be non-zero (read greater than
> one) after calling truncate_inode_pages().
>
> But I'm looking at a much older kernel so it's quite different to
> current upstream and this seemed like a question relevant to both
> kernels to get some idea of how page reclaim works.
>
> I guess what I'm really looking to work out is if it's possible, with
> the current upstream kernel, for a mapping to have nrpages greater than
> 1 after calling truncate_inode_pages() and hopefully get some
> explanation of why if that's not so.

I assume you're worrying about a truncate_inode_pages(mapping, 0). If
it's truncate_inode_pages(mapping, 1), or lstart anything greater than 0,
then it will leave behind the incompletely truncated pages at the start:
no mystery in that.

>
> It's certainly possible with the older kernel I'm looking at but I need
> some info. before I consider looking for possible changes to back port.

Probably what you're looking for is Jan Kara's v3.0 commit 08142579b6ca
"mm: fix assertion mapping->nrpages == 0 in end_writeback()".

>
> >
> > I'm not sure how much to read in to your use of the word "swap".
> > It's true that shmem/tmpfs uses swap (of the swapon/swapoff variety)
> > as backing for its pages when under pressure (and uses its own
> > variant
> > shmem_undo_range() to manage that, instead of
> > truncate_inode_pages()),
> > but most filesystems don't use "swap" at all.
> >
> > I just noticed your subject "paged I/O sub system": I hope you
> > realize
> > that mm/page_io.c is solely concerned with swap (of the
> > swapon/swapoff
> > variety), and has next to nothing to do with filesystems. (Just as,
> > conversely, mm/swap.c has next to nothing to do with swap.)
>
> LOL, right, I'm looking at the page reclaim code which, so far, hasn't
> lead me to either of those source files.
>
> >
> > >
> > > Anyone, please?
> >
> > I hope something I've said there has helped, but warn you that
> > I'm a terrible person to engage in an extended conversation with!
> > Expect long silences, pray for someone else to jump in.
>
> As well as pointing out that swap storage shouldn't be used in this
> case you've reminded me of the difference between swapping and demand
> paging, so that's a good start.

So long as you leave it as a distant memory: you're right that "swapping"
used to mean copying out a whole process to disk and reading in another,
but Linux never implemented it that way: it's always been paging out to
and in from the swap medium, much like demand paging from file.

(I say "never" and "always": I think that's so,
but I don't really know beyond v2.4.0.)

Hugh

>
> Perhaps folks at linux-mm will have more to say.
>
>
> > > Ian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/