Based on David's suggestion for speeding up guest_memfd memory
population [1] made at the guest_memfd upstream call on 5 Dec 2024 [2],
this adds `filemap_grab_folios` that grabs multiple folios at a time.
Motivation
When profiling guest_memfd population and comparing the results with
population of anonymous memory via UFFDIO_COPY, I observed that the
former was up to 20% slower, mainly due to adding newly allocated pages
to the pagecache. As far as I can see, the two main contributors to it
are pagecache locking and tree traversals needed for every folio. The
RFC attempts to partially mitigate those by adding multiple folios at a
time to the pagecache.
Testing
With the change applied, I was able to observe a 10.3% (708 to 635 ms)
speedup in a selftest that populated 3GiB guest_memfd and a 9.5% (990 to
904 ms) speedup when restoring a 3GiB guest_memfd VM snapshot using a
custom Firecracker version, both on Intel Ice Lake.
Limitations
While `filemap_grab_folios` handles THP/large folios internally and
deals with reclaim artifacts in the pagecache (shadows), for simplicity
reasons, the RFC does not support those as it demonstrates the
optimisation applied to guest_memfd, which only uses small folios and
does not support reclaim at the moment.