Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize

From: David Hildenbrand
Date: Thu Jun 06 2024 - 04:30:38 EST


Additionally, we also should alter RO permission of those 7 tail pages
to RW to avoid panic().

We can use RCU, which IMO is a better choice, as the following:

get_page_unless_zero()
{
int rc = false;

rcu_read_lock();

if (page_is_fake_head(page) || !page_ref_count(page)) {
smp_mb(); // implied by atomic_add_unless()
goto unlock;
}

rc = page_ref_add_unless();

unlock:
rcu_read_unlock();

return rc;
}

And on the HVO/de-HOV sides:

folio_ref_unfreeze();
synchronize_rcu();
HVO/de-HVO;

I think this is a lot better than making tail page metadata RW because:
1. it helps debug, IMO, a lot;
2. I don't think HVO is the only one that needs this.

David (we missed you in today's THP meeting),

Sorry, I had a private meeting conflict :)


Please correct me if I'm wrong -- I think virtio-mem also suffers from
the same problem when freeing offlined struct page, since I wasn't
able to find anything that would prevent a **speculative** struct page
walker from trying to access struct pages belonging to pages being
concurrently offlined.

virtio-mem does currently not yet optimize fake-offlined memory like HVO would. So the only way we really remove "struct page" metadata is by actually offlining+removing a complete Linux memory block, like ordinary memory hotunplug would.

It might be an interesting project to optimize "struct page" metadata consumption for fake-offlined memory chunks within an online Linux memory block.

The biggest challenge might be interaction with memory hotplug, which requires all "struct page" metadata to be allocated. So that would make cases where virtio-mem hot-plugs a Linux memory block but keeps parts of it fake-offline a bit more problematic to handle .

In a world with memdesc this might all be nicer to handle I think :)


There is one possible interaction between virtio-mem and speculative page references: all fake-offline chunks in a Linux memory block do have on each page a refcount of 1 and PageOffline() set. When actually offlining the Linux memory block to remove it, virtio-mem will drop that reference during MEM_GOING_OFFLINE, such that memory offlining can proceed (seeing refcount==0 and PageOffline()).

In virtio_mem_fake_offline_going_offline() we have:

if (WARN_ON(!page_ref_dec_and_test(page)))
dump_page(page, "fake-offline page referenced");

which would trigger on a speculative reference.

We never saw that trigger so far because quite a long time must have passed ever since a page might have been part of the page cache / page tables, before virtio-mem fake-offlined it (using alloc_contig_range()) and the Linux memory block actually gets offlined.

But yes, RCU (e.g., on the memory offlining path) would likely be the right approach to make sure GUP-fast and the pagecache will no longer grab this page by accident.


If this is true, we might want to map a "zero struct page" rather than
leave a hole in vmemmap when offlining pages. And the logic on the hot
removal side would be similar to that of HVO.

Once virtio-mem would do something like HVO, yes. Right now virtio-mem only removes struct-page metadata by removing/unplugging its owned Linux memory blocks once they are fully "logically offline".

--
Cheers,

David / dhildenb