On Fri, Jun 30, 2017 at 05:55:08PM -0700, prakash sangappa wrote:
Interesting that UFFDIO_COPY is faster then fallocate(). In the DB use caseThe major difference is that with UFFDIO_COPY the hugepage will be
the page does not need to be allocated at the time a process trips on
the hugetlbfs
file hole and receives SIGBUS. fallocate() is called on the hugetlbfs file,
when more memory needs to be allocated by a separate process.
immediately mapped into the virtual address without requiring any
further minor fault. So it's ideal if you could arrange to call
UFFDIO_COPY from the same process that is going to touch and use the
hugetlbfs data immediately after. You would eliminate a minor fault
that way.
UFFDIO_COPY at least for anon was measured to perform better than a
regular page fault too.
Regarding hugetlbfs mount option, one consideration is to allow mounts ofThere's no particular reason to make UFFDIO_FEATURE_SIGBUS a
hugetlbfs inside user namespaces's mount namespace. Which would allow
non privileged processes to mount hugetlbfs for use inside a user
namespace.
This may be needed even for the 'min_size' mount option using which an
application could reserve huge pages and mount a filesystem for its use,
with out the need to have privileges given the system has enough hugepages
configured. It seems if non privileged processes are allowed to mount
hugetlbfs
filesystem, then min_size should be subject to some resource limits.
Mounting inside user namespace will be a different patch proposal later.
privileged op unless we want to eliminate the branch with the static
key, so it's certainly simpler than dealing with hugetlbfs min_size
reserves.
I'm positive about the UFFDIO_FEATURE_SIGBUS tradeoffs, but others
feel free to comment.
If you could make second patch to extend the selftest to exercise and
validates UFFDIO_FEATURE_SIGBUS in anon/shmem/hugetlbfs it'd be great.
Thanks,
Andrea