Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize
From: David Hildenbrand
Date: Thu Jun 06 2024 - 04:30:38 EST
Additionally, we also should alter RO permission of those 7 tail pages
to RW to avoid panic().
We can use RCU, which IMO is a better choice, as the following:
get_page_unless_zero()
{
int rc = false;
rcu_read_lock();
if (page_is_fake_head(page) || !page_ref_count(page)) {
smp_mb(); // implied by atomic_add_unless()
goto unlock;
}
rc = page_ref_add_unless();
unlock:
rcu_read_unlock();
return rc;
}
And on the HVO/de-HOV sides:
folio_ref_unfreeze();
synchronize_rcu();
HVO/de-HVO;
I think this is a lot better than making tail page metadata RW because:
1. it helps debug, IMO, a lot;
2. I don't think HVO is the only one that needs this.
David (we missed you in today's THP meeting),
Sorry, I had a private meeting conflict :)
Please correct me if I'm wrong -- I think virtio-mem also suffers from
the same problem when freeing offlined struct page, since I wasn't
able to find anything that would prevent a **speculative** struct page
walker from trying to access struct pages belonging to pages being
concurrently offlined.
virtio-mem does currently not yet optimize fake-offlined memory like HVO
would. So the only way we really remove "struct page" metadata is by
actually offlining+removing a complete Linux memory block, like ordinary
memory hotunplug would.
It might be an interesting project to optimize "struct page" metadata
consumption for fake-offlined memory chunks within an online Linux
memory block.
The biggest challenge might be interaction with memory hotplug, which
requires all "struct page" metadata to be allocated. So that would make
cases where virtio-mem hot-plugs a Linux memory block but keeps parts of
it fake-offline a bit more problematic to handle .
In a world with memdesc this might all be nicer to handle I think :)
There is one possible interaction between virtio-mem and speculative
page references: all fake-offline chunks in a Linux memory block do have
on each page a refcount of 1 and PageOffline() set. When actually
offlining the Linux memory block to remove it, virtio-mem will drop that
reference during MEM_GOING_OFFLINE, such that memory offlining can
proceed (seeing refcount==0 and PageOffline()).
In virtio_mem_fake_offline_going_offline() we have:
if (WARN_ON(!page_ref_dec_and_test(page)))
dump_page(page, "fake-offline page referenced");
which would trigger on a speculative reference.
We never saw that trigger so far because quite a long time must have
passed ever since a page might have been part of the page cache / page
tables, before virtio-mem fake-offlined it (using alloc_contig_range())
and the Linux memory block actually gets offlined.
But yes, RCU (e.g., on the memory offlining path) would likely be the
right approach to make sure GUP-fast and the pagecache will no longer
grab this page by accident.
If this is true, we might want to map a "zero struct page" rather than
leave a hole in vmemmap when offlining pages. And the logic on the hot
removal side would be similar to that of HVO.
Once virtio-mem would do something like HVO, yes. Right now virtio-mem
only removes struct-page metadata by removing/unplugging its owned Linux
memory blocks once they are fully "logically offline".
--
Cheers,
David / dhildenb