Re: [PATCH] psi: annotate refault stalls from IO submission

From: Johannes Weiner
Date: Tue Jul 23 2019 - 16:42:14 EST

On Tue, Jul 23, 2019 at 01:34:50PM -0600, Jens Axboe wrote:
> On 7/23/19 1:04 PM, Johannes Weiner wrote:
> > CCing Jens for bio layer stuff
> >
> > On Tue, Jul 23, 2019 at 10:02:26AM +1000, Dave Chinner wrote:
> >> Even better: If this memstall and "refault" check is needed to
> >> account for bio submission blocking, then page cache iteration is
> >> the wrong place to be doing this check. It should be done entirely
> >> in the bio code when adding pages to the bio because we'll only ever
> >> be doing page cache read IO on page cache misses. i.e. this isn't
> >> dependent on adding a new page to the LRU or not - if we add a new
> >> page then we are going to be doing IO and so this does not require
> >> magic pixie dust at the page cache iteration level
> >
> > That could work. I had it at the page cache level because that's
> > logically where the refault occurs. But PG_workingset encodes
> > everything we need from the page cache layer and is available where
> > the actual stall occurs, so we should be able to push it down.
> >
> >> e.g. bio_add_page_memstall() can do the working set check and then
> >> set a flag on the bio to say it contains a memstall page. Then on
> >> submission of the bio the memstall condition can be cleared.
> >
> > A separate bio_add_page_memstall() would have all the problems you
> > pointed out with the original patch: it's magic, people will get it
> > wrong, and it'll be hard to verify and notice regressions.
> >
> > How about just doing it in __bio_add_page()? PG_workingset is not
> > overloaded - when we see it set, we can generally and unconditionally
> > flag the bio as containing userspace workingset pages.
> >
> > At submission time, in conjunction with the IO direction, we can
> > clearly tell whether we are reloading userspace workingset data,
> > i.e. stalling on memory.
> >
> > This?
> Not vehemently opposed to it, even if it sucks having to test page flags
> in the hot path.

Yeah, it's not great :/ Just seems marginally better than annotating
all the callsites and maintain correctness there in the future.

> Maybe even do:
> if (!bio_flagged(bio, BIO_WORKINGSET) && PageWorkingset(page))
> bio_set_flag(bio, BIO_WORKINGSET);
> to at least avoid it for the (common?) case where multiple pages are
> marked as workingset.

Sounds good. If refaults occur, most likely the whole readahead batch
has that flag set, so I've added that. I've also marked the page test

This way we have no jumps in the most common path (no refaults), one
jump in the second most common (bit already set), and the double for
the least likely case of hitting the first refault page in a batch.

Updated patch below.