Re: [PATCH] mm/thp: Update mm_struct's MM_ANONPAGES stat for huge zero pages
From: Hugh Dickins
Date: Fri May 21 2021 - 01:56:26 EST
On Fri, 21 May 2021, Anshuman Khandual wrote:
> On 5/21/21 7:47 AM, Hugh Dickins wrote:
> > On Tue, 18 May 2021, Anshuman Khandual wrote:
> >
> >> Although the zero huge page is being shared across various processes, each
> >> mapping needs to update its mm_struct's MM_ANONPAGES stat by HPAGE_PMD_NR
> >> to be consistent. This just updates the stats in set_huge_zero_page() after
> >> the mapping gets created and in zap_huge_pmd() when mapping gets destroyed.
> >>
> >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> >> Cc: Zi Yan <ziy@xxxxxxxxxx>
> >> Cc: linux-mm@xxxxxxxxx
> >> Cc: linux-kernel@xxxxxxxxxxxxxxx
> >> Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
> >
> > NAK.
> >
> > For consistency with what? In the all the years that the huge zero page
>
> Huge zero page after all is an anon mapping.
No, it's a shared page used inside an anon mapping. It's not PageAnon.
> Hence why it must not be
> counted for process rss MM_ANONPAGES accounting ?
Because it's being shared between (potentially) many tasks, and many
parts of their address space. It's intended to give them all a saving.
> Is there a specific
> reason why zero huge pages should not counted for MM_ANONPAGES ?
We'd have to look back to ~20-year-old mail (maybe lore has it) to see
whatever arguments were made for and against counting the ZERO_PAGE()
in rss - certainly it can be argued either way, but the way it was
decided is to leave it out of rss. And the huge zero page was
introduced to follow the same rule.
> Does
> not it risk an inaccurate MM_ANONPAGES accounting for the processes
> using huge zero pages ?
They are accurately and consistently accounted (until your patch);
it's only inaccurate if you're expecting it to be included in rss
(agreed, it can be a surprise to find that it is not included).
>
> > has existed, it has been intentionally exempted from rss accounting:
> > consistent with the small zero page, which itself has always been
> > intentionally exempted from rss accounting. In fact, that's a good part
>
> Why are small zero pages exempted from rss accounting which in turn
> might have caused huge zero page to take the same approach as well ?
A gift from the kernel, so large unCOWed zero areas come for free.
>
> > of the reason the huge zero page was introduced (see 4a6c1297268c).
>
> Huge zero page reduces memory consumption on read faults which is a
> definite advantage but would there be a problem if rss stat goes up
> for each huge zero page mapping instances. The commit mentioned here
> talks about increase in actual memory utilization as seen from the
> rss accounting stat, without huge zero page usage.
>
> I am wondering if actual memory usage remains the same (reduced with
> huge zero page usage), what could be the problem in an increased
> MM_ANONPAGES accounting for a given process.
If a process is monitoring its usage, or its usage is being monitored,
then a sudden increase in rss like this will cause them trouble:
it's an incompatible change which would have to be reverted.
> Dont we update the rss
> accounting for shared memory as well ?
Yes, in MM_SHMEMPAGES; but I don't get your point there.
I suppose we could add an MM_ZEROPAGES, but I don't know where
it would be shown, nor who would be interested to look at it.
>
> >
> > To change that now will break any users depending on it.
>
> Just to understand it correctly, are rss accounting procedures/methods
> something which cannot be changed as users are currently dependent on
> it ? OR this proposal here is not a good enough reason to change it ?
I guess we could change it if there were a security issue with it,
but I don't think you're suggesting that. You have not actually
given any reason to change it, other than "to be consistent".
Yet you're choosing inconsistency with long-standing usage.
>
> >
> > Not to mention the
> > BUG: Bad rss-counter state mm:00000000aa61ef82 type:MM_ANONPAGES val:512
> > I just got from mmotm.
>
> Okay, unfortunately some how did not see this problem while testing. Is
> there a specific test case which will be able to trigger this bug ?
I did a read fault into a large private anonymous mapping, to map in
the huge zero page; I then wrote a byte to that page, to COW it; and
exited. Checked dmesg and found that "Bad rss-counter" as expected.
(I lie: actually, I thought I mapped in the huge zero page in two
places, and COWed them both; so was a bit surprised to see val:512
when I was expecting val:1024. I guess I screwed up my addresses.)
There's probably one or more of the LTP tests which would show it,
but don't care to chase down implementation details in a NAKed patch.
Hugh