On Wed 16-10-19 15:45:06, David Hildenbrand wrote:
On 16.10.19 13:43, Michal Hocko wrote:
On Thu 19-09-19 16:22:25, David Hildenbrand wrote:
virtio-mem wants to allow to offline memory blocks of which some parts
were unplugged, especially, to later offline and remove completely
unplugged memory blocks. The important part is that PageOffline() has
to remain set until the section is offline, so these pages will never
get accessed (e.g., when dumping). The pages should not be handed
back to the buddy (which would require clearing PageOffline() and
result in issues if offlining fails and the pages are suddenly in the
buddy).
Let's use "PageOffline() + reference count = 0" as a sign to
memory offlining code that these pages can simply be skipped when
offlining, similar to free or HWPoison pages.
Pass flags to test_pages_isolated(), similar as already done for
has_unmovable_pages(). Use a new flag to indicate the
requirement of memory offlining to skip over these special pages.
In has_unmovable_pages(), make sure the pages won't be detected as
movable. This is not strictly necessary, however makes e.g.,
alloc_contig_range() stop early, trying to isolate such page blocks -
compared to failing later when testing if all pages were isolated.
Also, make sure that when a reference to a PageOffline() page is
dropped, that the page will not be returned to the buddy.
memory devices (like virtio-mem) that want to make use of this
functionality have to make sure to synchronize against memory offlining,
using the memory hotplug notifier.
Alternative: Allow to offline with a reference count of 1
and use some other sign in the struct page that offlining is permitted.
Few questions. I do not see onlining code to take care of this special
case. What should happen when offline && online?
Should we allow to try_remove_memory to succeed with these pages?
Do we really have hook into __put_page? Why do we even care about the
reference count of those pages?
Oh, I forgot to answer this questions. The __put_page() change is necessary
for the following race I identified:
Page has a refcount of 1 (e.g., allocated by virtio-mem using
alloc_contig_range()).
a) kernel: get_page_unless_zero(page): refcount = 2
b) virtio-mem: set page PG_offline, reduce refcount): refocunt = 1
c) kernel: put_page(page): refcount = 0
The page would suddenly be given to the buddy. which is bad.
But why cannot you keep the reference count at 1 (do get_page when
offlining the page)? In other words as long as the driver knows the page
has been returned to the host then it has ref count at 1. Once the page
is returned to the guest for whatever reason it can free it to the
system by clearing the offline state and put_page.
Again, when onlining memory you have to assume the memmap is complete garbage. No trusting *at all* on that as of now. This represents a BIG change to what we have right now. Not totally against that, but it sounds like a bigger rework that mainly helps to fix HWPoison.
An elevated ref count could help to detect that the memory hotremove is
not safe until the driver removes all potential metadata it might still
hold. You also know that memory online should skip such a page.
All in all your page is still in use by the driver and the life cycle is
controlled by that driver.
It's just complex stuff :) I guess the part you are missing is that the driver officially signals "I have no direct reference, you can offline this memory, I know how to deal with that". It's not like "this is a balloon inflated page I keep in a list".
Or am I am missing something?