Re: + mm-vmscan-add-mm_vmscan_inactive_list_is_low-tracepoint.patch added to -mm tree

From: Minchan Kim
Date: Fri Jan 13 2017 - 03:57:43 EST


On Fri, Jan 13, 2017 at 08:47:07AM +0100, Michal Hocko wrote:
> On Fri 13-01-17 10:37:24, Minchan Kim wrote:
> > Hello,
> >
> > On Thu, Jan 12, 2017 at 10:10:17AM +0100, Michal Hocko wrote:
> > > On Thu 12-01-17 17:48:13, Minchan Kim wrote:
> > > > On Thu, Jan 12, 2017 at 09:15:54AM +0100, Michal Hocko wrote:
> > > > > On Thu 12-01-17 14:12:47, Minchan Kim wrote:
> > > > > > Hello,
> > > > > >
> > > > > > On Wed, Jan 11, 2017 at 04:52:39PM +0100, Michal Hocko wrote:
> > > > > > > On Wed 11-01-17 08:52:50, Minchan Kim wrote:
> > > > > > > [...]
> > > > > > > > > @@ -2055,8 +2055,8 @@ static bool inactive_list_is_low(struct
> > > > > > > > > if (!file && !total_swap_pages)
> > > > > > > > > return false;
> > > > > > > > >
> > > > > > > > > - inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
> > > > > > > > > - active = lruvec_lru_size(lruvec, file * LRU_FILE + LRU_ACTIVE);
> > > > > > > > > + total_inactive = inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
> > > > > > > > > + total_active = active = lruvec_lru_size(lruvec, file * LRU_FILE + LRU_ACTIVE);
> > > > > > > > >
> > > > > > > >
> > > > > > > > the decision of deactivating is based on eligible zone's LRU size,
> > > > > > > > not whole zone so why should we need to get a trace of all zones's LRU?
> > > > > > >
> > > > > > > Strictly speaking, the total_ counters are not necessary for making the
> > > > > > > decision. I found reporting those numbers useful regardless because this
> > > > > > > will give us also an information how large is the eligible portion of
> > > > > > > the LRU list. We do not have any other tracepoint which would report
> > > > > > > that.
> > > > > >
> > > > > > The patch doesn't say anything why it's useful. Could you tell why it's
> > > > > > useful and inactive_list_is_low should be right place?
> > > > > >
> > > > > > Don't get me wrong, please. I don't want to bother you.
> > > > > > I really don't want to add random stuff although it's tracepoint for
> > > > > > debugging.
> > > > >
> > > > > This doesn't sounds random to me. We simply do not have a full picture
> > > > > on 32b systems without this information. Especially when memcgs are
> > > > > involved and global numbers spread over different LRUs.
> > > >
> > > > Could you elaborate it?
> > >
> > > The problem with 32b systems is that you only can consider a part of the
> > > LRU for the lowmem requests. While we have global counters to see how
> > > much lowmem inactive/active pages we have, those get distributed to
> > > memcg LRUs. And that distribution is impossible to guess. So my thinking
> > > is that it can become a real head scratcher to realize why certain
> > > active LRUs are aged while others are not. This was the case when I was
> > > debugging the last issue which triggered all this. All of the sudden I
> > > have seen many invocations when inactive and active were zero which
> > > sounded weird, until I realized that those are memcg's lruvec which is
> > > what total numbers told me...
> >
> > Hmm, it seems I miss something. AFAIU, what you need is just memcg
> > identifier, not all lru size. If it isn't, please tell more detail
> > usecase of all lru size in that particular tracepoint.
>
> Having memcg id would be definitely helpful but that alone wouldn't tell
> us how is the lowmem distributed. To be honest I really fail to see why
> this bothers you all that much.

Because I fail to understand why you want to need additional all zone's
LRU stat in inactive_list_is_low. With clear understanding, we can think
over that it's really needed and right place to achieve the goal.

Could you say with a example you can think? It's really helpful to
understand why it's needed.

>
> [...]
> > > > > I am not sure I am following. Why is the additional parameter a problem?
> > > >
> > > > Well, to me, it's not a elegance. Is it? If we need such boolean variable
> > > > to control show the trace, it means it's not a good place or think
> > > > refactoring.
> > >
> > > But, even when you refactor the code there will be other callers of
> > > inactive_list_is_low outside of shrink_active_list...
> >
> > Yes, that's why I said "it's okay if you love your version". However,
> > we can do refactoring to remove "bool trace" and even, it makes code
> > more readable, I believe.
> >
> > >From 06eb7201d781155a8dee7e72fbb8423ec8175223 Mon Sep 17 00:00:00 2001
> > From: Minchan Kim <minchan@xxxxxxxxxx>
> > Date: Fri, 13 Jan 2017 10:13:36 +0900
> > Subject: [PATCH] mm: refactoring inactive_list_is_low
> >
> > Recently, Michal Hocko added tracepoint into inactive_list_is_low
> > for catching why VM decided to age the active list to know
> > active/inacive balancing problem. With that, unfortunately, it
> > added "bool trace" to inactlive_list_is_low to control some place
> > should be prohibited tracing. It is not elegant to me so this patch
> > try to clean it up.
> >
> > Normally, most inactive_list_is_low is used for deciding active list
> > demotion but one site(i.e., get_scan_count) uses for other purpose
> > which reclaim file LRU forcefully. Sites for deactivation calls it
> > with shrink_active_list. It means inactive_list_is_low could be
> > located in shrink_active_list.
> >
> > One more thing this patch does is to remove "ratio" in the tracepoint
> > because we can get it by post processing in script via simple math.
> >
> > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
> > ---
> > include/trace/events/vmscan.h | 9 +++-----
> > mm/vmscan.c | 51 ++++++++++++++++++++++++-------------------
> > 2 files changed, 31 insertions(+), 29 deletions(-)
>
> this cleanup adds more lines than it removes. I think reporting the

It's just marginal because the function names are really long and I want to
keep a 80 column rule.
Anyway, I'm not insisting on pushing this patch although I still think
it's not nice to add "boolean variable" to control tracing or not.
It's not a main interest.

> ratio is helpful because it doesn't cost us anything while calculating
> it by later is just a bit annoying.

I really cannot imagine when inactive_ratio value is helpful for debugging.