RE: [PATCH 1/1] mm: vmstat: introducing vm counter for slowpath

From: PINTU KUMAR
Date: Mon Aug 10 2015 - 05:46:27 EST


Hi,

> -----Original Message-----
> From: Andrew Morton [mailto:akpm@xxxxxxxxxxxxxxxxxxxx]
> Sent: Saturday, August 08, 2015 4:06 AM
> To: PINTU KUMAR
> Cc: 'Michal Hocko'; linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> minchan@xxxxxxxxxx; dave@xxxxxxxxxxxx; koct9i@xxxxxxxxx;
> mgorman@xxxxxxx; vbabka@xxxxxxx; js1304@xxxxxxxxx;
> hannes@xxxxxxxxxxx; alexander.h.duyck@xxxxxxxxxx;
> sasha.levin@xxxxxxxxxx; cl@xxxxxxxxx; fengguang.wu@xxxxxxxxx;
> cpgs@xxxxxxxxxxx; pintu_agarwal@xxxxxxxxx; pintu.k@xxxxxxxxxxx;
> vishnu.ps@xxxxxxxxxxx; rohit.kr@xxxxxxxxxxx
> Subject: Re: [PATCH 1/1] mm: vmstat: introducing vm counter for slowpath
>
> On Fri, 07 Aug 2015 18:16:47 +0530 PINTU KUMAR <pintu.k@xxxxxxxxxxx>
> wrote:
>
> > > > This is useful to know the rate of allocation success within the
> > > > slowpath.
> > >
> > > What would be that information good for? Is a regular administrator
> > > expected
> > to
> > > consume this value or this is aimed more to kernel developers? If
> > > the later
> > then I
> > > think a trace point sounds like a better interface.
> > >
> > This information is good for kernel developers.
> > I found this information useful while debugging low memory situation
> > and sluggishness behavior.
> > I wanted to know how many times the first allocation is failing and
> > how many times system entering slowpath.
> > As I said, the existing counter does not give this information clearly.
> > The pageoutrun, allocstall is too confusing.
> > Also, if kswapd and compaction is disabled, we have no other counter
> > for slowpath (except allocstall).
> > Another problem is that allocstall can also be incremented from
> > hibernation during shrink_all_memory calling.
> > Which may create more confusion.
> > Thus I found this interface useful to understand low memory behavior.
> > If device sluggishness is happening because of too many slowpath or
> > due to some other problem.
> > Then we can decide what will be the best memory configuration for my
> > device to reduce the slowpath.
> >
> > Regarding trace points, I am not sure if we can attach counter to it.
> > Also trace may have more over-head and requires additional configs to
> > be enabled to debug.
> > Mostly these configs will not be enabled by default (at least in
> > embedded, low memory device).
> > I found the vmstat interface more easy and useful.
>
> This does seem like a pretty basic and sensible thing to expose in vmstat. It
> probably makes more sense than some of the other things we have in there.
>
Thanks Andrew.
Yes, as par my analysis, I feel that this is one of the useful and important
interface.
I added it in one of our internal product and found it to be very useful.
Specially during shrink_memory and compact_nodes analysis I found it really
useful.
It helps me to prove that if higher-order pages are present, it can reduce the
slowpath drastically.
Also during my ELC presentation people asked me how to monitor the slowpath
counts.

> Yes, it could be a tracepoint but practically speaking, a tracepoint makes it
> developer-only. You can ask a bug reporter or a customer "what is
> /proc/vmstat:slowpath_entered" doing, but it's harder to ask them to set up
> tracing.
>
Yes, at times tracing are painful to analyze.
Also, in commercial user binaries, most of tracing support are disabled (with no
root privileges).
However, /proc/vmstat works with normal user binaries.
When memory issues are reported, we just get log dumps and few interfaces like
this.
Most of the time these memory issues are hard to reproduce because it may happen
after long usage.

> And I don't think this will lock us into anything - vmstat is a big dumping
ground
> and I don't see a big problem with removing or changing things later on. IMO,
> debugfs rules apply here and vmstat would be in debugfs, had debugfs existed
at
> the time.
>
>
> Two things:
>
> - we appear to have forgotten to document /proc/vmstat
>
Yes, I could not find any document on vmstat under kernel/Documentation.
I think it's a nice think to have.
May be, I can start this initiative to create one :)
If respective owner can update, it will be great.

> - How does one actually use slowpath_entered? Obviously we'd like to
> know "what proportion of allocations entered the slowpath", so we
> calculate
>
> slowpath_entered/X
>
> how do we obtain "X"? Is it by adding up all the pgalloc_*? If
> so, perhaps we should really have slowpath_entered_dma,
> slowpath_entered_dma32, ...?

I think the slowpath for other zones may not be required.
We just need to know how many times we entered slowpath and possibly do
something to reduce it.
But, I think, pgalloc_* count may also include success for fastpath.

How I use slowpath for analysis is:
VMSTAT BEFORE AFTER %DIFF
---------- ---------- ---------- ------------
nr_free_pages 6726 12494 46.17%
pgalloc_normal 985836 1549333 36.37%
pageoutrun 2699 529 80.40%
allocstall 298 98 67.11%
slowpath_entered 16659 739 95.56%
compact_stall 244 21 91.39%
compact_fail 178 11 93.82%
compact_success 52 7 86.54%

The above values are from 512MB system with only NORMAL zone.
Before, the slowpath count was 16659.
After (memory shrinker + compaction), the slowpath reduced by 95%, for the same
scenario.
This is just an example.

If we are interested to know even allocation success/fail ratio in slowpath,
then I think we need more counters.
Such as; direct_reclaim_success/fail, kswapd_success/fail (just like compaction
success/fail).
OR, we can have pgalloc_success_fastpath counter.
Then we can do:
pgalloc_success_in_slowpath = (pgalloc_normal - pgalloc_success_fastpath)
Therefore, success_ratio for slowpath could be;

(pgalloc_success_in_slowpath/slowpath_entered) * 100

More comments, welcome.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/