On Fri, Jun 30, 2017 at 11:47:35AM +0200, Michal Hocko wrote:[...]
As an aside, I rememeber that prior to MADV_FREE there was longThat would provide an equivalent API to the one volatile pages
discussion about lazy freeing of memory from userspace. Some users
wanted to be signalled when their memory was freed by the system so that
they could rebuild the original content (e.g. uncompressed images in
memory). It seems like MADV_FREE + this signalling could be used for
that usecase. John would surely know more about those usecases.
provided agreed. So it would allow to adapt code (if any?) more easily
to drop the duplicate feature in volatile pages code (however it would
be faster if the userland code using volatile pages lazy reclaim mode
was converted to poll the uffd so the kernel talks directly to the
monitor without involving a SIGBUS signal handler which will cause
spurious enter/exit if compared to signal-less uffd API).
The main benefit in my view is not volatile pages but that
UFFD_FEATURE_SIGBUS would work equally well to enforce robustness on
all kind of memory not only hugetlbfs (so one could run the database
with robustness on THP over tmpfs) and the new cache can be injected
in the filesystem using UFFDIO_COPY which is likely faster than
fallocate as UFFDIO_COPY was already demonstrated to be faster even
than a regular page fault.
It's also simpler to handle backwards compatibility with the
UFFDIO_API call, that allows probing if UFFD_FEATURE_SIGBUS is
supported by the running kernel regardless of kernel version (so it
can be backported and enabled by the database, without the database
noticing it's on a older kernel version).
So while this wasn't the intended way to use the userfault and I
already pointed out the possibility to use a single monitor to do all
this, I'm positive about UFFD_FEATURE_SIGBUS if the overhead of having
a monitor is so concerning.
Ultimately there are many pros and just a single cons: the branch in
handle_userfault().
I wonder if it would be possible to use static_branch_enable() in
UFFDIO_API and static_branch_unlikely in handle_userfault() to
eliminate that branch but perhaps it's overkill and UFFDIO_API is
unprivileged and it would send an IPI to all CPUs. I don't think we
normally expose the static_branch_enable() to unprivileged userland
and making UFFD_FEATURE_SIGBUS a privileged op doesn't sound
attractive (although the alternative of altering a hugetlbfs mount
option would be a privileged op).
Thanks,
Andrea