Re: Pinning ZONE_MOVABLE pages
From: Pavel Tatashin
Date: Fri Nov 20 2020 - 16:54:56 EST
On Fri, Nov 20, 2020 at 4:34 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
>
> > Am 20.11.2020 um 22:17 schrieb Matthew Wilcox <willy@xxxxxxxxxxxxx>:
> >
> > On Fri, Nov 20, 2020 at 09:59:24PM +0100, David Hildenbrand wrote:
> >>
> >>>> Am 20.11.2020 um 21:28 schrieb Pavel Tatashin <pasha.tatashin@xxxxxxxxxx>:
> >>>
> >>> Recently, I encountered a hang that is happening during memory hot
> >>> remove operation. It turns out that the hang is caused by pinned user
> >>> pages in ZONE_MOVABLE.
> >>>
> >>> Kernel expects that all pages in ZONE_MOVABLE can be migrated, but
> >>> this is not the case if a user applications such as through dpdk
> >>> libraries pinned them via vfio dma map. Kernel keeps trying to
> >>> hot-remove them, but refcnt never gets to zero, so we are looping
> >>> until the hardware watchdog kicks in.
> >>>
> >>> We cannot do dma unmaps before hot-remove, because hot-remove is a
> >>> slow operation, and we have thousands for network flows handled by
> >>> dpdk that we just cannot suspend for the duration of hot-remove
> >>> operation.
> >>>
> >>
> >> Hi!
> >>
> >> It‘s a known problem also for VMs using vfio. I thought about this some while ago an came to the same conclusion: before performing long-term pinnings, we have to migrate pages off the movable zone. After that, it‘s too late.
> >
> > We can't, though. VMs using vfio pin their entire address space (right?)
> > so we end up with basically all of the !MOVABLE memory used for VMs and
> > the MOVABLE memory goes unused (I'm thinking about the case of a machine
> > which only hosts VMs and has nothing else to do with its memory). In
> > that case, the sysadmin is going to reconfigure ZONE_MOVABLE away, and
> > now we just don't have any ZONE_MOVABLE. So what's the point?
>
> When the guest is using an vIOMMU, it will only pin what‘s currently mapped by the guest into the vIOMMU. Otherwise: yes.
Right, not all guest memory needs to be pinned, so ZONE_MOVABLE can
still be used for a vast amount of allocations.
>
> If you assume all memory will be used for VMs with vfio, then yes: no ZONE_MOVABLE, no memory hotunplug. If its‘s only some VMs, it‘s a different story.
Sounds like in such an extreme case it is reasonable to assume no
hot-plug. But, when you have 8G, and need to remove 2G movable zone,
but can't guarantee it even if you have 6G of free mem, this is
unreasonable.
>
> >
> > ZONE_MOVABLE can also be pinned by mlock() and other such system calls.
>
> Mlocked pages can be migrated, no? They are simply not swappable iirc.
Yes, mlocked they are simply in memory, but the content of the pages
can be migrated to a different place in RAM.
>
> > The kernel needs to understand that ZONE_MOVABLE memory may not actually
> > be movable, and skip the unmovable stuff.
> >
>
> Then you don‘t have unplug guarantees. Memory unplug broken by design. Then there is no point in optimizing that case at all and tell customers „vfio and memory hotunplug is incompatible“. The only ugly thing is the endless loop.
Right, if memory in ZONE_MOVABLE is not guaranteed to be movable, we
can never guarantee memory hot-remove even when we have a lot of free
memory to migrate to.
>