Re: [PATCH] pci: Enable overrides for missing ACS capabilities

From: Bjorn Helgaas
Date: Mon Jun 24 2013 - 13:44:27 EST


On Wed, Jun 19, 2013 at 6:43 AM, Don Dutile <ddutile@xxxxxxxxxx> wrote:
> On 06/18/2013 10:52 PM, Bjorn Helgaas wrote:
>>
>> On Tue, Jun 18, 2013 at 5:03 PM, Don Dutile<ddutile@xxxxxxxxxx> wrote:
>>>
>>> On 06/18/2013 06:22 PM, Alex Williamson wrote:
>>>>
>>>>
>>>> On Tue, 2013-06-18 at 15:31 -0600, Bjorn Helgaas wrote:
>>>>>
>>>>>
>>>>> On Tue, Jun 18, 2013 at 12:20 PM, Alex Williamson
>>>>> <alex.williamson@xxxxxxxxxx> wrote:
>>>>>>
>>>>>>
>>>>>> On Tue, 2013-06-18 at 11:28 -0600, Bjorn Helgaas wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 30, 2013 at 12:40:19PM -0600, Alex Williamson wrote:
>>>
>>> ...
>>>>>>>
>>>>>>> Who do you expect to decide whether to use this option? I think it
>>>>>>> requires intimate knowledge of how the device works.
>>>>>>>
>>>>>>> I think the benefit of using the option is that it makes assignment
>>>>>>> of
>>>>>>> devices to guests more flexible, which will make it attractive to
>>>>>>> users.
>>>>>>> But most users have no way of knowing whether it's actually *safe* to
>>>>>>> use this. So I worry that you're adding an easy way to pretend
>>>>>>> isolation
>>>>>>> exists when there's no good way of being confident that it actually
>>>>>>> does.
>>
>>
>>> ...
>>
>>
>>>>> I wonder if we should taint the kernel if this option is used (but not
>>>>> for specific devices added to pci_dev_acs_enabled[]). It would also
>>>>> be nice if pci_dev_specific_acs_enabled() gave some indication in
>>>>> dmesg for the specific devices you're hoping to add to
>>>>> pci_dev_acs_enabled[]. It's not an enumeration-time quirk right now,
>>>>> so I'm not sure how we'd limit it to one message per device.
>>>>
>>>>
>>>> Right, setup vs use and getting single prints is a lot of extra code.
>>>> Tainting is troublesome for support, Don had some objections when I
>>>> suggested the same to him.
>>>>
>>> For RH GSS (Global Support Services), a 'taint' in the kernel printk
>>> means
>>> RH doesn't support that system. The 'non-support' due to 'taint' being
>>> printed
>>> out in this case may be incorrect -- RH may support that use, at least
>>> until
>>> a more sufficient patched kernel is provided.
>>> Thus my dissension that 'taint' be output. WARN is ok. 'driver beware',
>>> 'unleashed dog afoot'.... sure...
>>
>>
>> So ... that's really a RH-specific support issue, and easily worked
>> around by RH adding a patch that turns off tainting.
>>
> sure. what's another patch to the thousands... :-/
>
>> It still sounds like a good idea to me for upstream, where use of this
>> option can very possibly lead to corruption or information leakage
>> between devices the user claimed were isolated, but in fact were not.
>
> Did I miss something? This patch provides a user-level/chosen override;
> like all other overrides, (pci=realloc, etc.), it can lead to a failing
> system.
> IMO, this patch is no different. If you want to tag this patch with taint,
> then let's audit all the (PCI) overrides and taint them appropriately.
> Taint should be reserved to changes to the kernel that were done outside
> the development of the kernel, or with the explicit intent to circumvent
> the normal operation of the kernel. This patch provides a way to enable
> ACS checking to succeed when the devices have not provided sufficiently
> complete
> ACS information. i.e., it's a growth path for PCIe-ACS and its need for
> proper support.

We're telling the kernel to assume something (the hardware provides
protection) that may not be true. If that assumption turns out to be
false, the result is that a VM can be crashed or comprised by another
VM.

One difference I see is that this override can lead to a crash that
looks like random memory corruption and has no apparent connection to
the actual cause. Most other overrides won't cause run-time crashes
(I think they're more likely to cause boot or device configuration
failures), and the dmesg log will probably have good clues as to the
reason.

But the possibility of compromise is probably even more serious,
because there would be no crash at all, and we'd have no indication
that VM A read or corrupted data in VM B. I'm very concerned about
that, enough so that it's not clear to me that an override belongs in
the upstream kernel at all.

Yes, that would mean some hardware is not suitable for device
assignment. That just sounds like "if hardware manufacturers do their
homework and support ACS properly, their hardware is more useful for
virtualization than other hardware." I don't see the problem with
that.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/