> > Is there anything else that can reasonably be done here to make the APIAdd DavidH and OscarS for memory hot-remove questions.
IIUC, struct page could be freed if a chunk of memory is hot-removed.
Right, but only after there are no users anymore (IOW, memory was freed
back to the buddy). PFN walkers might still stumble over them, but I
would not expect (or recommend) rust to do that.
The physaddr to page function does look up pages by pfn, but it's
intended to be used by drivers that know what they're doing. There are
two variants of the API, one that is completely unchecked (a fast path
for cases where the driver definitely allocated these pages itself, for
example just grabbing the `struct page` back from a decoded PTE it
wrote), and one that has this check:
pfn_valid(pfn) && page_is_ram(pfn)
Which is intended as a safety net to allow drivers to look up
firmware-reserved pages too, and fail gracefully if the kernel doesn't
know about them (because they weren't declared in the
bootloader/firmware memory map correctly) or doesn't have them mapped in
the direct map (because they were declared no-map).
safer to call on an arbitrary pfn?
If the answer is "no" then that's fine. It's still an unsafe function
and we need to document in the safety section that it should only be
used for memory that is either known to be allocated and pinned and will
not be freed while the `struct page` is borrowed, or memory that is
reserved and not owned by the buddy allocator, so in practice correct
use would not be racy with memory hot-remove anyway.
This is already the case for the drm/asahi use case, where the pfns
looked up will only ever be one of:
- GEM objects that are mapped to the GPU and whose physical pages are
therefore pinned (and the VM is locked while this happens so the objects
cannot become unpinned out from under the running code),
- Raw pages allocated from the page allocator for use as GPU page tables,
- System memory that is marked reserved by the firmware/bootloader,
- (Potentially) invalid PFNs that aren't part of the System RAM region
at all and don't have a struct page to begin with, which we check for,
so the API returns an error. This would only happen if the bootloader
didn't declare some used firmware ranges at all, so Linux doesn't know
about them.
Another case struct page can be freed is when hugetlb vmemmap
optimization
is used. Muchun (cc'd) is the maintainer of hugetlbfs.
Here, the "struct page" remains valid though; it can still be accessed,
although we disallow writes (which would be wrong).
If you only allocate a page and free it later, there is no need to worry
about either on the rust side.
This is what the safe API does. (Also the unsafe physaddr APIs if all
you ever do is convert an allocated page to a physaddr and back, which
is the only thing the GPU page table code does during normal use. The
walking leaf PFNs story is only for GPU device coredumps when the
firmware crashes.)