Re: [PATCH v3 -tip x86/apic 1/2] PCI/MSI: Allocate as manymultiple-MSIs as requested

From: Bjorn Helgaas
Date: Wed May 29 2013 - 16:58:48 EST


[-cc Suresh]

On Wed, May 29, 2013 at 2:36 AM, Alexander Gordeev <agordeev@xxxxxxxxxx> wrote:
> On Tue, May 28, 2013 at 03:51:52PM -0600, Bjorn Helgaas wrote:
>> On Mon, May 13, 2013 at 3:05 AM, Alexander Gordeev <agordeev@xxxxxxxxxx> wrote:
>>
>> The subject would make more sense as "Allocate *only* as many MSIs as
>> requested."
>
> 1.
>
>> > When multiple MSIs are enabled with pci_enable_msi_block(), the
>> > requested number of interrupts 'nvec' is rounded up to the nearest
>> > power-of-two value.
>>
>> This rounding is just a consequence of the encodings of the Multiple
>> Message Enable field in the Message Control register (PCI spec r3.0,
>> sec 6.8.1.3), isn't it?
>
> Yes, it is.
>
>> > The result is then used for setting up the
>> > number of MSI messages in the PCI device and allocation of
>> > interrupt resources in the operating system (i.e. vector numbers).
>> > Thus, in cases when a device driver requests some number of MSIs
>> > and this number is not a power-of-two value, the extra operating
>> > system resources (allocated as the result of rounding) are wasted.
>> >
>> > This fix introduces 'msi_desc::nvec' field to address the above
>> > issue. When non-zero, it will report the actual number of MSIs the
>> > device will send, as requested by the device driver. This value
>> > should be used by architectures to properly set up and tear down
>> > associated interrupt resources.
>>
>> This name needs a little more context, like "nvec_used" or something.
>
> I chose "nvec" to indicate it is what was passed to pci_enable_msi_block().
> I can resend with "nvec_used", along with subject change [1], if you want.
>
>> I think the idea is that the Message Control register can only tell
>> the OS that the device requires 1, 2, 4, 8, 16, or 32 vectors, and
>> similarly the OS can only tell the device that 1, 2, 4, 8, 16, or 32
>> vectors are assigned. If a device can only make use of 18 vectors, it
>> must advertise the next larger value (32 vectors). As far as I can
>> tell, a device *could* advertise 32 vectors in Multiple Message
>> Capable even if it can only use 1 vector.
>
> Yes, that is what we have with i.e. ICH AHCI device - it advertises
> 16 vectors while makes use of 6 only. I tried to explain this in my
> changelog's last paragraph (below).
>
>> These patches are to avoid allocating resources for the unused
>> vectors, i.e., the ones between the last one the driver requested and
>> the last one advertised in Multiple Message Capable.
>
> Almost :) Rather ...between the last one the driver requested and
> the last one *written* in Multiple Message *Enable*, not Capable.
> IOW, between the last one the driver requested and the closest power
> of two - which will be written to the device.

Ah, right.

> As of now, neither pci_enable_msi_block(), nor pci_enable_msi_block_auto()
> are able to address the case you described, but if we decide to change
> that then 'msi_desc::nvec' is what would be used. Again, the last paragraph
> (may be too subtly) implies that.
>
>> The driver might
>> request fewer than the maximum either because it knows the device
>> isn't capable of using them all, or because the driver author decided
>> not to use them all.
>
> Exactly. (I assume here "or the driver author decided not to use them all"
> means the author can tell the device how many interrupts to use by means
> other than Multiple Message Enable - otherwise it would be a bug).

Yep, makes sense. Thanks for the clarifications.

>> (Sorry, just thinking out loud above, let me know if I'm not
>> understanding this correctly.)
>>
>> > Note, although the existing 'msi_desc::multiple' field might seem
>> > redundant, in fact in does not. In general case the number of MSIs a
>> > PCI device is initialized with is not necessarily the closest power-
>> > of-two value of the number of MSIs the device will send. Thus, in
>> > theory it would not be always possible to derive the former from the
>> > latter and we need to keep them both, to stress this corner case.
>> > Besides, since 'msi_desc::multiple' is a bitfield, throwing it out
>> > would not save us any space.
>
> --
> Regards,
> Alexander Gordeev
> agordeev@xxxxxxxxxx

No need to resend as far as I'm concerned; I can tweak those bits
locally. I can put these in my tree
if Joerg or Konrad ack the iommu/irq_remapping.c bit.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/