Re: [patch 00/26] x64, x2apic/intr-remap: Interrupt-remapping and x2apic support

From: Eric W. Biederman
Date: Thu Jul 10 2008 - 23:22:30 EST


Suresh Siddha <suresh.b.siddha@xxxxxxxxx> writes:

>> That sounds nice in principle. I saw cpu cache flushes, I saw writes.
>> I did not see any reads which is necessary to get that behavior with
>> the standard pci transaction rules.
>
> qi_flush_iec() will submit an invalidation descriptor and will wait
> till it finishes the invalidation of the interrupt entry cache.
> qi_submit_sync() will do the job. Descriptor completion ensures that
> the inflight interrupts are flushed.

I will have to look a second time. It seems I did not see the wait.

>> Having seen enough little races and misbehaving hardware I'm very paranoid
>> about irq migration. The current implementation is belt and suspenders
>> and I still think there are races that I have missed.
>
> Eric, This process irq migration is done on the cutting edge hardware
> which was designed with all the feedback and experiences in the mind ;)
>
> And also, I don't think we are deviating much from what we are currently doing.
> We are still using cleanup vector etc, to clean up the previous vector
> allocation.

Yes. In general I think the design appears sound. I'm just not certain
of the implementation details. Having spent way to many hours
debugging some very subtle hardware issues, I am paranoid.


>> Sounds good. Ultimately we are looking at handler_data or chip_data.
>> There are very specific rules that meant I could not use them for
>> the msi data but otherwise I don't remember exactly what the are for.
>> IOMMU are covered though.
>
> IOMMU is covered as part of pci_dev (pci_sysdata). But in the case of
> interrupt-remapping, there are some interrupt resources like ioapics and
> hpet, which don't have the corresponding pci dev. Will take a look at this.
>
>> At least for msi the code you are working on was essentially unified
>> when it was written, it just happened to have two copies. I don't
>> think I'm asking for heaving lifting. Mostly just putting code that
>> is touched into something other then the growing monstrosity that is
>> ioapic.c
>
> We can create msi.c which handles MSI specific handling. I will
> look into this. But I def welcome somone beating me in posting those
> patches :) I made a note of this however.

Thanks. It is all of these little things. My hope is that we can
whittle down the unshared core instead of increasing the amount of
non-shared code.

>> Further can we please see some better abstractions. In particular can
>> we generate a token for the irq destination. And have the msi and
>> ioapic setup read that token and program it into the hardware. The
>> rules for which bits go where is exactly the same both with and
>> without irq_remapping so having an if statement there seems to obscure
>> what is really happening. Especially if as it appears that we may be used
>> the new token format with x2apics without remapping.
>
> unfortunately x2apic can't be enabled with out enabling interrupt-remapping.
> Interrupts don't work in majority of the configurations (as I mentioned
> earlier). Programming IOAPIC RTE's and MSI address/data registers are
> completely different based on the presence of interrupt-remapping.

Regardless of the x2apic mode issues there is a different issue here (address better in
another response where I gave an example).

There are a set of architecturally defined bits that can be
programmed. These same bits exist in the ioapic routing entry and in
the msi message.

Therefore we should have a generic mapping function that says give the architecturally
defined bits.

Then both the ioapic setup and the msi setup can call x86_irq_map(irq) get those
architectural bits and program them in their architecturally defined location.

For MSI it looks like you be able to take advantage of a few more bits, but the
same principle applies.

Getting these intermediate abstractions relatively clean is important so we can do
things with the hardware.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/