Re: [kvm-devel] [PATCH] export notifier #1

From: Avi Kivity
Date: Wed Jan 23 2008 - 05:28:17 EST


Christoph Lameter wrote:
Ahhh. Good to hear. But we will still end in a situation where only
the remote ptes point to the page. Maybe the remote instance will dirty
the page at that point?


When the spte is dropped, its dirty bit is transferred to the page.
sharing code, and for you missing a single notifier means memory
corruption because you don't bump the page count to represent the
external reference).

The approach with the export notifier is page based not based on the mm_struct. We only need a single page count for a page that is exported to a number of remote instances of linux. The page count is dropped when all the remote instances have unmapped the page.

That won't work for kvm. If we have a hundred virtual machines, that means 99 no-op notifications.

Also, our rmap key for finding the spte is keyed on (mm, va). I imagine most RDMA cards are similar.
@@ -966,6 +973,9 @@ int try_to_unmap(struct page *page, int BUG_ON(!PageLocked(page));
+ if (unlikely(PageExported(page)))
+ export_notifier(invalidate_page, page);
+
Passing the page here will complicate things especially for shared
pages across different VM that are already working in KVM. For non

How?

shared pages we could cache the userland mapping address in
page->private but it's a kludge only working for non-shared
pages. Walking twice the anon_vma lists when only a single walk is

There is only the need to walk twice for pages that are marked Exported. And the double walk is only necessary if the exporter does not have its own rmap. The cross partition thing that we are doing has such an rmap and its a matter of walking the exporters rmap to clear out the external references and then we walk the local rmaps. All once.

The problem is that external mmus need a reverse mapping structure to locate their ptes. We can't expand struct page so we need to base it on mm + va.

Besides the pinned pages ram leak by having the zero locking window
above I'm curious how you are going to take care of the finegrined
aging that I'm doing with the accessed bit set by hardware in the spte

I think I explained that above. Remote users effectively are forbidden to establish new references to the page by the clearing of the exported bit.


Can they wait on that bit?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/