Re: [RFC/PATCHv2] x86/irq: round-robin distribution of irqs to cpusw/in node
From: Thomas Gleixner
Date: Sun Oct 17 2010 - 06:44:24 EST
Eric,
On Tue, 28 Sep 2010, Eric W. Biederman wrote:
> Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes:
> >> My gut feel says that the real answer is to delay assigning a vector
> >> to an irq until request_irq(). At which point we will know that someone
> >> at least wants to use the irq.
Looked a bit deeper into the users. Quite a bunch do
pci_enable_msix();
request_irqs();
in their probe function. There is no sign of making them per cpu or
node. mlx4 is one of them. No sign of anything related to nodes or cpus in
the whole driver.
Even if the driver does not request the irqs from the probe function,
why does it need to do the msi/msix setup in the probe function at
all?
Wouldn't it be sufficient to do that at open() right before the
interrupts are requested.
> > Right. So the solution would be:
> >
> > create_irq allocates an irq number + irq descriptor, nothing else
> >
> > chip->startup() will setup the vector and chip->shutdown releases
> > it. That requires to change the return value of chip->startup to int,
> > so we can return an error code, but that can be done in course of the
> > overhaul I'm working on.
> >
> > Right now I prefer not to add more crap to io_apic.c, it's horrible
> > enough already. I'll fix that with the cleanup.
>
> Understood. It has taken a couple of years before this bug finally
> bit anyone waiting a release or two to get it fixed properly seems
> reasonable.
That needs some real work on drivers :)
> pci_enable_msix all in it's own way is fixable, but it has
> few enough callers < 80 that it is also fixable.
The fundamental flaw of arch_setup_msi_irqs() is
node = dev_to_node(&dev->dev);
That's the only node information we get. So we put everything on a
single node.
What we really need is a node entry in struct msi_desc, which gets
assigned from struct msix_entry in msix_setup_entries().
Thoughts ?
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/