Re: [PATCH v2] vmalloc: Fix issues with flush flag

From: Edgecombe, Rick P
Date: Mon May 20 2019 - 18:20:20 EST


On Tue, 2019-05-21 at 00:36 +0300, Meelis Roos wrote:
> > Switch VM_FLUSH_RESET_PERMS to use a regular TLB flush intead of
> > vm_unmap_aliases() and fix calculation of the direct map for the
> > CONFIG_ARCH_HAS_SET_DIRECT_MAP case.
> >
> > Meelis Roos reported issues with the new VM_FLUSH_RESET_PERMS flag
> > on a
> > sparc machine. On investigation some issues were noticed:
> >
> > 1. The calculation of the direct map address range to flush was
> > wrong.
> > This could cause problems on x86 if a RO direct map alias ever got
> > loaded
> > into the TLB. This shouldn't normally happen, but it could cause
> > the
> > permissions to remain RO on the direct map alias, and then the page
> > would return from the page allocator to some other component as RO
> > and
> > cause a crash.
> >
> > 2. Calling vm_unmap_alias() on vfree could potentially be a lot of
> > work to
> > do on a free operation. Simply flushing the TLB instead of the
> > whole
> > vm_unmap_alias() operation makes the frees faster and pushes the
> > heavy
> > work to happen on allocation where it would be more expected.
> > In addition to the extra work, vm_unmap_alias() takes some locks
> > including
> > a long hold of vmap_purge_lock, which will make all other
> > VM_FLUSH_RESET_PERMS vfrees wait while the purge operation happens.
> >
> > 3. page_address() can have locking on some configurations, so skip
> > calling
> > this when possible to further speed this up.
> >
> > Fixes: 868b104d7379 ("mm/vmalloc: Add flag for freeing of special
> > permsissions")
> > Reported-by: Meelis Roos<mroos@xxxxxxxx>
> > Cc: Meelis Roos<mroos@xxxxxxxx>
> > Cc: Peter Zijlstra<peterz@xxxxxxxxxxxxx>
> > Cc: "David S. Miller"<davem@xxxxxxxxxxxxx>
> > Cc: Dave Hansen<dave.hansen@xxxxxxxxx>
> > Cc: Borislav Petkov<bp@xxxxxxxxx>
> > Cc: Andy Lutomirski<luto@xxxxxxxxxx>
> > Cc: Ingo Molnar<mingo@xxxxxxxxxx>
> > Cc: Nadav Amit<namit@xxxxxxxxxx>
> > Signed-off-by: Rick Edgecombe<rick.p.edgecombe@xxxxxxxxx>
> > ---
> >
> > Changes since v1:
> > - Update commit message with more detail
> > - Fix flush end range on !CONFIG_ARCH_HAS_SET_DIRECT_MAP case
>
> It does not work on my V445 where the initial problem happened.
>
Thanks for testing. So I guess that suggests it's the TLB flush causing
the problem on sparc and not any lazy purge deadlock. I had sent Meelis
another test patch that just flushed the entire 0 to ULONG_MAX range to
try to always the get the "flush all" logic and apprently it didn't
boot mostly either. It also showed that it's not getting stuck anywhere
in the vm_remove_alias() function. Something just hangs later.