On 29/07/15 06:28, Andy Lutomirski wrote:
On Tue, Jul 28, 2015 at 8:01 PM, Boris OstrovskyFYI, I have got a repro now and am investigating.
<boris.ostrovsky@xxxxxxxxxx> wrote:
On 07/28/2015 08:47 PM, Andrew Cooper wrote:As far as I can tell, this affects TLB flushes but not unmaps. That
On 29/07/2015 01:21, Andy Lutomirski wrote:So just for kicks I made lazy_max_pages() return 0 to free vmaps immediately
On Tue, Jul 28, 2015 at 10:10 AM, Boris OstrovskyNothing, but that is fine. If a page is in use in two vcpus LDTs, it is
<boris.ostrovsky@xxxxxxxxxx> wrote:
On 07/28/2015 01:07 PM, Andy Lutomirski wrote:Looking at map_ldt_shadow_page: what keeps shadow_ldt_mapcnt from
On Tue, Jul 28, 2015 at 9:30 AM, Andrew CooperYes, I added some instrumentation to the hypervisor and we definitely
<andrew.cooper3@xxxxxxxxxx> wrote:
I suspect that the set_ldt(NULL, 0) call hasn't reached Xen beforeI just instrumented it with yet more LSL instructions. I'm pretty
xen_free_ldt() is attempting to nab back the pages which Xen still has
mapped as an LDT.
sure that set_ldt really is clearing at least LDT entry zero.
Nonetheless the free_ldt call still oopses.
set
LDT to NULL before failing.
-boris
getting incremented once on each CPU at the same time if both CPUs
fault in the same shadow LDT page at the same time?
expected to have a type refcount of 2.
Similarly, whata cmpxchg() loop in the depths of __get_page_type().
keeps both CPUs from calling get_page_type at the same time and
therefore losing track of the page type reference count?
I don't see why vmalloc or vm_unmap_aliases would have anything to do
with this, though.
and the problem went away.
means that my patch is totally bogus -- vm_unmap_aliases() *flushed*
aliases but isn't involved in removing them from the page tables.
That must be why xen_alloc_ldt and xen_set_ldt work today.
So what does flushing the TLB have to do with anything? The only
thing I can think of is that it might force some deferred hypercalls
out. I can reproduce this easily on UP, so IPIs aren't involved.
The other odd thing is that it seems like this happens when clearing
the LDT and freeing the old one but not when setting the LDT and
freeing the old one. This is plausibly related to the lazy mode in
effect at the time, but I have no evidence for that.
Two more data points: Putting xen_flush_mc before and after the
SET_LDT multicall has no effect. Putting flush_tlb_all() in
xen_free_ldt doesn't help either, while vm_unmap_aliases() in the
exact same place does help.