Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke()

From: Joerg Roedel
Date: Fri May 01 2020 - 06:16:15 EST


On Thu, Apr 30, 2020 at 10:39:19PM -0400, Steven Rostedt wrote:
> I'll give the answer I gave to Joerg when he replied to my accidental
> private (not public) email:
>
> Or even my original patch would be better than having the generic tracing
> code understanding the intrinsic properties of vmalloc() and
> alloc_percpu() on x86_64. I really don't think it is wise to have:
>
> foo = alloc_percpu();
>
> /*
> * Because of some magic with the way alloc_percpu() works on
> * x86_64, we need to synchronize the pgd of all the tables,
> * otherwise the trace events that happen in x86_64 page fault
> * handlers can't cope with accessing the chance that a
> * alloc_percpu()'d memory might be touched in the page fault trace
> * event. Oh, and we need to audit all alloc_percpu() and vmalloc()
> * calls in tracing, because something might get triggered within a
> * page fault trace event!
> */
> vmalloc_sync_mappings();
>
> That would be exactly what I add as a comment if it were to be added in the
> generic tracing code.
>
> And we would need to audit any percpu alloc'd code in all tracing, or
> anything that might git hooked into something that hooks to the page fault
> trace point.
>
> Since this worked for a decade without this, I'm strongly against adding it
> in the generic code due to some issues with a single architecture.

That is exactly the problem with vmalloc_sync_mappings()/unmappings().
It is not at all clear when it needs to be called and why, or even who
needs is responsible for calling it. The existing call-sites in Notifier
and ACPI code have no comment on why it is necessary to synchronize the
vmalloc mappings there.

It is only needed for x86, we could also get rid of it completely if:

1) At x86-64 we pre-allocate all 64 P4D/PUD pages for the
vmalloc area in init_mm at boot time. This needs 256kb of
memory per system, most of it potentially unused as each
P4D/PUD maps 512GB of address space.

2) At x86-32 we need to disable large pages for vmalloc/ioremap
mappings and pre-allocate the PTE pages for the vmalloc area
in init_mm. Depending on how much memory the system has and
the configured kernel/user split this might take more than 64
pages.

With that we could get rid of the vmalloc_sync interface and also the
vmalloc-fault code in general and reduce the complexity. This interface
has caused problems more than once. On the other side it would trade
memory usage against complexity.

Regards,

Joerg