On 13/01/17 17:37, David Daney wrote:
On 01/13/2017 08:15 AM, Marc Zyngier wrote:
Thanks Linus for looping me in.
On 12/01/17 22:35, David Daney wrote:
Hi Thomas,
I am trying to figure out how to handle this situation:
handle_level_irq()
+---------------+ handle_fasteoi_irq()
| PCIe hosted | +-----------+
+-----+
--level_gpio---->| GPIO to MSI-X |--MSI_message--+>| gicv3-ITS |---> |
CPU |
| widget | | +-----------+
+-----+
+---------------+ |
|
+-------------------+ |
| other PCIe device |---MSI_message-----+
+-------------------+
The question is how to structure the interrupt handling. My initial
attempt was a chaining arrangement where the GPIO driver does
request_irq() for the appropriate MSI-X vector, and the handler calls
back into the irq system like this:
static irqreturn_t thunderx_gpio_chain_handler(int irq, void *dev)
{
struct thunderx_irqdev *irqdev = dev;
int chained_irq;
int ret;
chained_irq = irq_find_mapping(irqdev->gpio->chip.irqdomain,
irqdev->line);
if (!chained_irq)
return IRQ_NONE;
ret = generic_handle_irq(chained_irq);
return ret ? IRQ_NONE : IRQ_HANDLED;
}
Thus getting the proper GPIO irq_chip functions called to manage the
level triggering semantics.
The drawbacks of this approach are that there are then two irqs
associated with the GPIO line (the base MSI-X and the chained GPIO),
also there can be up to 80-100 of these widgets, so potentially we can
consume twice that many irq numbers.
It was suggested by Linus Walleij that using an irq domain hierarchy
might be a better idea. However, I cannot figure out how this might
work. The gicv3-ITS needs to use handle_fasteoi_irq(), and we need
handle_level_irq() for the GPIO-level lines. Getting the proper
irq_chip functions called in a hierarchical configuration doesn't seem
doable given the heterogeneous flow handlers.
My main worry here is that you're trying to handle something that is
apparently a level interrupt using a transport that only handles edge
interrupts. It means that each time you EOI an interrupt, you need to be
able to resample the level and retrigger it if still high. Do you have a
HW facility to do so? Or do you emulate this in SW?
Yes.
The first thing the handle_level_irq() flow handler does is to mask the
source. After calling the handler, it then unmasks the source.
The act of unmasking in the HW causes another MSI to be emitted if the
level signal is still active. This is what we want as it ensures that
interrupts keep firing as long as the signal is active, even though the
underlying mechanism uses edge semantics MSI.
Can you think of a better way of structuring this than chaining from the
MSI-X handler as I outlined above?
We already have similar horrors - see irq-mbigen.c which does exactly
that. It creates IRQ domains spanning a bunch of MSIs allocated to that
platform device. Once you have that, the rest is pretty easy.
In your case, it probably requires adding the same kind of surgery so
that we can create an IRQ domain from the MSIs associated with a PCIe
device. Not too hard, just not there yet, and we can probably reuse some
of the code that lives in platform-msi.c
That seems too ugly.
I would propose one of the following:
1) Just keep the crappy chaining more or less as I originally
implemented it, as it works well enough to get useful work done.
I think that's the really ugly bit. It may work, but it is not something
I wish to see in code that I'd end-up being responsible for.
2) add an optional hierarchical flow handler function to struct
irq_data. If populated, this flow handler would be called from
handle_fasteoi_irq() instead of calling handle_irq_event(). This could
allow each irq_chip to have its own flow handler, instead of a single
flow handler shared by each of the hierarchically nested irq_chip.
So you want to generalize CONFIG_IRQ_PREFLOW_FASTEOI so that it is on
each level of a domain stack? Humm. I personally think that this is a
massive bloat that is going to impact all the hot paths for no gain
whatsoever, but I'll let tglx speak his mind on that.
I still think that being able to create an irqdomain based on the
interrupts allocated to a device and make that an irqchip is
conceptually simpler, already exists in the kernel, and fits the
existing infrastructure that has been put in place over the past two years.
It is also worth mentioning that you are already using this exact
hierarchical infrastructure (PCI/MSI -> ITS -> GIC), so I really don't
see why we should all of a sudden treat your particular device any
differently.
Thanks,
M.