Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled

From: Ming Lei
Date: Tue Mar 18 2014 - 23:32:56 EST


On Tue, Mar 18, 2014 at 8:27 AM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
> On Fri, Mar 14, 2014 at 09:48:35AM +0800, Ming Lei wrote:
>> On Fri, Mar 14, 2014 at 12:08 AM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
>> > On Thu, Mar 13, 2014 at 2:51 AM, Ming Lei <tom.leiming@xxxxxxxxx> wrote:
>> >> Hi Bjorn,
>> >>
>> >> I found this patch broke virtio-pci devices.
>> >
>> > Thanks a lot for testing this.
>> >
>> >> On Thu, Feb 27, 2014 at 3:37 AM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
>> >>> Don't rely on BAR contents when the command register says the BAR is
>> >>> disabled.
>> >>>
>> >>> If we receive a PCI device from firmware (or a hot-added device that was
>> >>> just powered up) with the MEMORY or IO enable bits in the PCI command
>> >>> register cleared, there's no reason to believe the BARs contain valid
>> >>> addresses.
>> >>
>> >> From PCI LOCAL BUS SPECIFICATION, REV. 3.0, both
>> >> PCI_COMMAND_IO and PCI_COMMAND_MEM should be
>> >> cleared after reset, so looks the patch sets IORESOURCE_UNSET
>> >> too early because PCI drivers may call pci_enable_device()
>> >> (->pci_enable_resources()) to enable the two bits of
>> >> PCI_COMMAND explicitly.
>> >
>> > The point is that it's not safe to enable those two bits unless we're
>> > certain that the BARs they control contain valid values that don't
>> > conflict with anything else in the system.
>> >
>> > Maybe we should only set IORESOURCE_UNSET when we find a conflict or a
>> > BAR that's not contained by an upstream bridge window, and we should
>> > try to reallocate then. I'm pretty sure we do that at least in some
>> > cases, but it would probably simplify things if we did it more
>> > consistently, and maybe we shouldn't set it at all here in
>> > __pci_read_base().
>>
>> I think so because __pci_read_base() is called in device emulation
>> path.
>
> Which path is this? I don't know anything about virtio-pci, and I only see
> calls to __pci_read_base() from:
>
> sriov_init()
> pci_sriov_resource_alignment()
> pci_read_bases()
>
>> > But I'd like to understand your situation better, so can you provide
>> > more details, please? Complete before/after dmesg logs would go a
>> > long way toward illustrating the problem you're seeing.
>>
>> Please see the two attachment log. The memory allocation failure
>> is caused by mistaken value read from pci address after the device
>> is failed to enable.
>
> Your logs are harder than necessary to compare because one has a lot more
> debug turned on than the other.
>
> In the failing case, we ignore all the initial BAR values, but we do assign
> values to all of them later:
>
> pci 0000:00:00.0: can't claim BAR 0 [mem size 0x00000400]: no address assigned
> pci 0000:00:00.0: can't claim BAR 1 [io size 0x0400]: no address assigned
> ...
> pci 0000:00:00.0: BAR 0: assigned [mem 0x40000000-0x400003ff]
> pci 0000:00:00.0: BAR 1: assigned [io 0x1000-0x13ff]
> ...
>
> The newly-assigned values look valid, and as far as I can tell, they should
> work. Do you know why they don't? Is there an assumption somewhere that
> we never change BAR values?

I don't know the cause, maybe it is related with the hypervisor
implementation.

>
> Even if we don't need to ignore BAR values in as many cases as we do, it
> should be legal to ignore them and reassign them, so I want to understand
> what's going on here before reverting this.
>
> Is there an easy way I can reproduce the problem on my own box?

It is not quite difficult, you can build a lkvm following the README in
below link and test -next tree on the small kvm hypervisor:

https://github.com/penberg/linux-kvm/blob/master/tools/kvm/README

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/