Re: [PATCH RESEND v2 7/7] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
From: Bjorn Helgaas
Date: Wed Feb 18 2015 - 09:30:54 EST
[+cc linux-mm, linux-kernel]
For context, the start of this discussion was here:
where Gavin is adding a new PCI hotplug driver for PowerNV. That new
driver calls vm_unmap_aliases() the same way we do in the existing RPA
hotplug driver here:
I'm trying to figure out whether it's correct to use
vm_unmap_aliases() here, but I'm not an mm person so all I have is my
gut feeling that something doesn't smell right.
On Tue, Feb 17, 2015 at 6:30 PM, Benjamin Herrenschmidt
> On Wed, 2015-02-18 at 11:16 +1100, Gavin Shan wrote:
>> >What is vm_unmap_aliases() for? I see this is probably copied from
>> >rpaphp_core.c, where it was added by b4a26be9f6f8 ("powerpc/pseries:
>> >lazy kernel mappings after unplug operations").
>> >But I don't know whether:
>> > - this is something specific to powerpc,
>> > - the lack of vm_unmap_aliases() in other hotplug paths is a bug,
>> > - the fact that we only do this on powerpc is covering up a
>> > powerpc bug somewhere
>> Yes, I copied this piece of code from rpaphp_core.c. I think Ben might
>> help to answer the questions as he added the patch. I had very quick
>> check on mm/vmalloc.c and it's reasonable to have vm_unmap_aliases()
>> here to flush TLB entries for ioremap() regions, which were unmapped
>> previously. if I'm correct. I don't think it's powerpc specific.
> It's specific to running under the PowerVM hypervisor, and thus doesn't
> affect PowerNV, just don't copy it over.
> It comes from the fact that the generic ioremap code nowadays delays
> TLB flushing on unmap. The TLB flushing code is what, on powerpc,
> ensures that we remove the translations from the MMU hash table (the
> hash table is essentially treated as an extended in-memory TLB), which
> on pseries turns into hypervisor calls.
> When running under that hypervisor, the HV ensures that no translation
> still exists in the hash before allowing a device to be removed from
> a partition. If translations still exist, the removal fails.
> So we need to force the generic ioremap code to perform all the TLB
> flushes for iounmap'ed regions before we "complete" the unplug operation
> from a kernel perspective so that the device can be re-assigned to
> another partition.
> This is thus useless on platforms like powernv which do not run under
> such a hypervisor.
So the hypervisor call that removes the device from the partition will
fail if there are any translations that reference the memory of the
Let me go through this in excruciating detail to see if I understand
what's going on:
- PCI core enumerates device D1
- PCI core sets device D1 BAR 0 = 0x1000
- driver claims D1
- driver ioremaps 0x1000 at virtual address V
- translation V -> 0x1000 is in TLB
- driver iounmaps V (but V -> 0x1000 translation may remain in TLB)
- driver releases D1
- hot-remove D1 (without vm_unmap_aliases(), hypervisor would fail this)
- it would be a bug to reference V here, but if we did, the
virt-to-phys translation would succeed and we'd have a Master Abort or
Unsupported Request on PCI/PCIe
- hot-add D2
- PCI core enumerates device D2
- PCI core sets device D2 BAR 0 = 0x1000
- it would be a bug to reference V here (before ioremapping), but if
we did, the reference would reach D2
I don't see anything hypervisor-specific here except for the fact that
the hypervisor checks for existing translations and most other
platforms don't. But it seems like the unexpected PCI aborts could
happen on any platform.
Are we saying that those PCI aborts are OK, since it's a bug to make
those references in the first place? Or would we rather take a TLB
miss fault instead so the references never make it to PCI?
I would think there would be similar issues when unmapping and
re-mapping plain old physical memory. But I don't see
vm_unmap_aliases() calls there, so those issues must be handled
differently. Should we handle this PCI hotplug issue the same way we
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/