Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

From: Thomas Gleixner
Date: Tue May 27 2014 - 15:59:00 EST

Next message: Kent Overstreet: "Re: [dm-devel] [PATCH] block: flush queued bios when the process blocks"
Previous message: Vince Weaver: "Re: perf: fuzzer getting stuck with slow memory leak"
In reply to: Jiang Liu: "[Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC"
Next in thread: Jiang Liu: "Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Jiang,

On Tue, 27 May 2014, Jiang Liu wrote:

> +static int alloc_irq_from_domain(struct irq_domain *domain, u32 gsi, int pin)
> {
> + int irq = -1;
> +
> + if (gsi >= arch_dynirq_lower_bound(0)) {
> + irq = irq_create_mapping(domain, pin);
> + } else if (gsi < NR_IRQS_LEGACY) {
> + if (!ioapic_identity_map)
> + irq = irq_create_mapping(domain, pin);
> + else if (irq_domain_associate(domain, gsi, pin) == 0)
> + irq = gsi;
> + } else if (irq_create_strict_mappings(domain, gsi, pin, 1) == 0) {
> + irq = gsi;
> + }

So you have these cases covered here:

1) The ACPI case of secondary ioapics. You only have the strict 1:1
mapping for the first ioapic

2) The gsi < NR_IRQS_LEGACY case where you have two options:

a) Let the core create a random virq number if ioapic_identity_map
is 0

ioapic_identity_map is only set by SFI and devicetree

So in all other cases we fall into that code path for all
legacy interrupts. So how is that supposed to work lets say for
i8042 which has hardcoded irq 1 and 12?

irq_create_mapping(1)

hint = 1 % nr_irqs; --> 1
virq = irq_alloc_desc_from(hint, of_node_to_nid(domain->of_node));

This returns something >= 16, because the irq descriptors
for 0-15 (LEGACY) are allocated already.

The pin association works, but how is the i8042 driver supposed
to figure out that it should request the virq >=16 which was
created instead of the hardcoded 1 ?

b) Associate the gsi and the pin

This only works because the virqs are already allocated at boot
time unconditionally due to arch_probe_nr_irqs() returning
NR_IRQS_LEGACY. So irq_domain_associate() works.
Undocumented works by chance behaviour.

3) The case where gsi < arch_dynirq_lower_bound()

You create a strict mapping here, fine.

This is confusing at best.

First of all, we should use legacy_pic->nr_legacy_irqs instead of
NR_IRQS_LEGACY all over the place.

mshyperv, ce4100 and intel-mid use the null_legacy_pic which has
nr_legacy_irqs = 0 and everything else uses the real pic which has
nr_legacy_irqs = NR_IRQS_LEGACY. So why do we even bother to allocate
and deal with NR_IRQS_LEGACY in the cases where we have no legacy?

ce4100 is an oddball though. The ioapic is registered way before the
interrupt subsystem is initialized and I have a hard time to
understand that comment:

/* We can't set this earlier, because we need to calibrate the timer */
legacy_pic = &null_legacy_pic;

The timer calibration happens after the interrupts are set up. I
assume it's check_timer() which wants that, but we know exactly how
the ce4100 works, so we might be able to avoid that whole "testing"
stuff. Sebastian, any input on this?

If it turns out that ce4100 needs the inital real legacy pic for some
magic reason we still can be clever by extending the legacy pic data
structure to tell us about that change, i.e. instead of using
legacy_pic->nr_legacy_irqs having a field "nr_allocated_irqs", which
is set to NR_IRQS_LEGACY for the real pic and to 0 for the null_pic
and let ce4100 set that field to NR_IRQS_LEGACY before switching the
legacy_pic over to the null implementation.

But what's really disgusting is the magic ioapic_identity_map and the
extra ACPI specific ioapic_dynirq_base hackery.

Why do we need strict mappings in the non ACPI case for all ioapic
pins? What's so different about ACPI? Or is this just to avoid
breaking the existing SFI/devicetree stuff. If that's the reason I'm
fine with it, but ...

independent of this question we want to be more clever about the
handling of strict, legacy and freely associated linux irq numbers.

So you have this weird ioapic_create_domain_fn callback in
mp_register_ioapic, which is solely there so the different callers can
hand in their domain ops and eventually set the magic
ioapic_identity_map flag.

+void __init mp_register_ioapic(int id, u32 address, u32 gsi_base,
+ ioapic_create_domain_fn cb, void *arg)

What about having

struct ioapic_domain {
struct irqdomain *domain;
const struct irq_domain_ops *ops;
void *arg;
enum domain_type type;
};

and add this struct to the ioapic struct. type is:

enum domain_type {
IOAPIC_STRICT,
IOAPIC_LEGACY,
IOAPIC_DYNAMIC,
};

Now the register function changes to:

void mp_register_ioapic(int id, u32 address, u32 gsi_base,
const struct irq_domain_ops *ops,
int type)
{
...

ioapics[idx].irqdomain.ops = ops;
ioapics[idx].irqdomain.arg = arg;
ioapics[idx].irqdomain.type = type;
ioapics[idx].irqdomain.domain = NULL;

...
}

and you can use mp_irqdomain_create() unconditionally for creating all
domains. And there you do:

static int dynirq_lower_bound;

mp_irqdomain_create()
{
ioapic->irqdomain.domain = irq_domain_add_linear(...);

switch (ioapic->irqdomain.type) {
case IOAPIC_LEGACY:
/*
* Associate the legacy interrupts which have been
* already allocated.
*/
for (i = 0; i < legacy_pic->nr_allocated_irqs; i++)
irq_domain_associate(domain, i, i);

case IOAPIC_STRICT:
dynirq_lower_bound += ioapic->nr_gsis;

case IOAPIC_DYNAMIC:
break;
}
}

So arch_dynirq_lower_bound() gets simplified to:
{
return dynirq_lower_bound;
}

And alloc_irq_from_domain() becomes:

int alloc_irq_from_domain(struct ioapic_domain *domain, int gsi, int pin)
{
switch (domain->type) {
case IOAPIC_DYNAMIC:
return irq_create_mapping(domain->domain, pin);
case IOAPIC_LEGACY:
case IOAPIC_STRICT:
return irq_create_strict_mappings(domain, gsi, pin, 1);
}
}

At the call site of alloc_irq_from_domain() you have already:

irq = irq_find_mapping(domain, pin);
if (irq <=0 ....)
alloc_irq_from_domain(domain, gsi, pin);

So because we associated the legacy_pic->nr_allocated_irqs in
mp_irqdomain_create() already, you'll never call into
alloc_irq_from_domain() for those and the remaining ones for that
first ioapic are handled by the IOAPIC_STRICT fall through.

For simplicity you can let SFI and devicetree register the ioapics
with their specific domain ops plus IOAPIC_LEGACY for the first ioapic
and IOAPIC_STRICT for all others. That also covers the case where the
null_legacy_pic with legacy_pic->nr_allocated_irqs == 0 is used.

In the ACPI case you register with the acpi domain ops and
IOAPIC_LEGACY for the first and IOAPIC_DYNAMIC for the extra ioapics.

Should be way cleaner and understandable, at least to me :)

Now there is that last oddity which bugs me in mp_map_pin_to_irq()

/*
* Don't use irqdomain to manage ISA IRQs because there may be
* multiple IOAPIC pins sharing the same ISA IRQ number and
* irqdomain only supports 1:1 mapping between IOAPIC pin and
* IRQ number.
*/
if (idx >= 0 && test_bit(mp_irqs[idx].srcbus, mp_bus_not_pci)) {
irq = mp_irqs[idx].srcbusirq;
if ((flags & IOAPIC_MAP_ALLOC) && info->count == 0 &&
mp_irqdomain_map(domain, irq, pin) != 0)
irq = -1;

That really looks like a hack. I'm aware that the current irqdomain
code cannot deal with that oddball case.

So what you are saying is that there are devices which have a separate
physical wire to different ioapic pins, but the ioapic is supposed to
bundle them to a shared interrupt.

I agree that this is odd enough to handle at the ioapic level, but
it'd be nice to have a more elaborative comment on this.

Aside of the above I'm pretty happy about the progress of this patch
set. One thing, which needs to be looked at are the usage sites of
irq_data->irq, whether they are safe. I did not spot any unsafe ones,
but a few functions which are called with irq_data->irq and make no
use of it.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Kent Overstreet: "Re: [dm-devel] [PATCH] block: flush queued bios when the process blocks"
Previous message: Vince Weaver: "Re: perf: fuzzer getting stuck with slow memory leak"
In reply to: Jiang Liu: "[Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC"
Next in thread: Jiang Liu: "Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]