Re: [PATCHv4 1/8] mm: Add support for unaccepted memory
From: Kirill A. Shutemov
Date: Sat Apr 09 2022 - 13:50:58 EST
On Fri, Apr 08, 2022 at 12:11:58PM -0700, Dave Hansen wrote:
> On 4/5/22 16:43, Kirill A. Shutemov wrote:
> > Kernel only needs to accept memory once after boot, so during the boot
> > and warm up phase there will be a lot of memory acceptance. After things
> > are settled down the only price of the feature if couple of checks for
> > PageUnaccepted() in allocate and free paths. The check refers a hot
> > variable (that also encodes PageBuddy()), so it is cheap and not visible
> > on profiles.
>
> Let's also not sugar-coat this. Page acceptance is hideously slow.
> It's agonizingly slow. To boot, it's done holding a global spinlock
> with interrupts disabled (see patch 6/8). At the very, very least, each
> acceptance operation involves a couple of what are effectively ring
> transitions, a 2MB memset(), and a bunch of cache flushing.
>
> The system is going to be downright unusable during this time, right?
Well, yes. The CPU that doing accepting is completely blocked by it.
But other CPUs may proceed until in in its turn steps onto memory
accepting.
> Sure, it's *temporary* and only happens once at boot. But, it's going
> to suck.
>
> Am I over-stating this in any way?
>
> The ACCEPT_MEMORY vmstat is good to have around. Thanks for adding it.
> But, I think we should also write down some guidance like:
>
> If your TDX system seems as slow as snail after boot, look at
> the "accept_memory" counter in /proc/vmstat. If it is
> incrementing, then TDX memory acceptance is likely to blame.
Sure. Will add to commit message.
> Do we need anything more discrete to tell users when acceptance is over?
I can imagine setups that where acceptance is never over. A VM running
a workload with fixed dataset can have planty of memory unaccepted.
I don't think "make it over" should be the goal.
> For instance, maybe they run something and it goes really slow, they
> watch "accept_memory" until it stops. They rejoice at their good
> fortune! Then, memory allocation starts falling over to a new node and
> the agony beings anew.
>
> I can think of dealing with this in two ways:
>
> cat /sys/.../unaccepted_pages_left
>
> which just walks the bitmap and counts the amount of pages remaining. or
> something like:
>
> echo 1 > /sys/devices/system/node/node0/make_the_pain_stop
>
> Which will, well, make the pain stop on node0.
Sure we can add handles. But API is hard. Maybe we should wait and see
what is actually needed. (Yes, I'm lazy.:)
--
Kirill A. Shutemov