Re: [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect

From: Peter Xu
Date: Tue Jan 05 2021 - 16:44:52 EST


On Tue, Jan 05, 2021 at 04:06:27PM -0500, Andrea Arcangeli wrote:
> The postcopy live snapshoitting was the #1 use case so it's hard not
> to mention it, but there's still other interesting userland use cases
> of uffd-wp with various users already testing it in their apps, that
> may ultimately become more prevalent, who knows.

That's true. AFAIU umap [1] uses uffd-wp for their computings already. I
didn't really measure how far it can go, but currently the library is highly
concurrent, for example, there're quite a few macros that can tune the
parallelism of the library [2]:

UMAP_PAGE_FILLERS This is the number of worker threads that will perform read
operations from the backing store (including read-ahead) for a specific umap
region.

UMAP_PAGE_EVICTORS This is the number of worker threads that will perform
evictions of pages. Eviction includes writing to the backing store if the
page is dirty and telling the operating system that the page is no longer
needed.

The write lock means at least all the evictor threads will be serialized,
immediately makes UMAP_PAGE_EVICTORS meaningless... not to mention all the rest
of read lock takers (filler threads, worker threads, etc.). So if it happens,
I bet LLNL will suddenly observe a drastic drop after upgrading the kernel..

I don't know why umap didn't hit the tlb issue already. It seems to me that
issues may only trigger with COW right after a stalled tlb so COW is the only
one affected (or, is it?) while umap may not use cow that lot by accident. But
I could be completely wrong on that.

[1] https://github.com/LLNL/umap
[2] https://llnl-umap.readthedocs.io/en/develop/environment_variables.html

--
Peter Xu