Re: [RFC PATCH v2] irqchip: add support for SMP irq router

From: Thomas Gleixner
Date: Wed Jul 20 2016 - 10:40:28 EST


On Wed, 20 Jul 2016, Sebastian Frias wrote:
> ----
> 2) In order to get a virq mapping for the domains associated with the outputs
> (the outputs connected to the GIC) I request a sort of "Fake HW IRQ"
> See irq-tango_v4.c:1400
>
> /* To request a virq we need a HW IRQ, use a "Fake HW IRQ" */
> hwirq = index + irqrouter->input_count + irqrouter->swirq_count;
>
> This feels a bit like a hack, so suggestions are welcome.

Well, look at it from the hardware level

|----------------|
| 0|---- Device irqs
| 1|----
|----------| | 2|----
| 0 |-----| 3|----
| 1 |-----| Router .|----
| GIC ...|-----| .|----
| N |-----| .|----
|----------| | .|----
| .|----
| X|----
|----------------|

So you have X hardware irqs in your router chip and the GIC has N. When an
interrupt is created then we

- allocate the virq number
- associate the hwirq number of the router (input side)
- route it to a hwirq number of the GIC (output side)

So what kind of magic numbers do you need for that? I don't see any.

The only information which is interesting is whether the router output is
shared between several router inputs or not because that might short cut
interrupt handling. Though you can start with the simple assumption that every
output is shared. That's a few cpu cycles overhead in the irq delivery path,
but a great simplification of the code in the first place.

Your device tree describes

1) the association of the device irq to the router irq
2) the routing between router input and output

#1 is standard device tree magic

#2 is a property of your router irq chip

There is no requirement to describe all routes, you can limit it to those
routes which you want to be exclusive. The rest can be assigned
automatically.

Pseuso code:

#define ROUTE_NONE 0xFFFFFFFF

struct router {
u32 routes[NR_INPUTS];
bool exclusive[NR_OUTPUTS];
u32 nextout;
};

init()
struct router *router = kzalloc(sizeof(*router));

/* Clear all routes */
memset(router->routes, 0xFF, sizeof(router->routes);

/* Scan device tree entries */
for (info = node_get_next_route(node); info; ) {
/* TODO: Sanity check info->in and info->out */
router->routes[info->in] = info->out;
router->exclusive[info->out] = true;
info = node_get_next_route(node);
}

interrupt setup()

/*
* We get the hwirq of the router input in the DT/FW descriptor.
* Check whether the interrupt is routed by DT or should be
* assigned a route dynamically.
*/
if (router->routes[hwirq] != ROUTE_NONE) {
outirq = router->routes[hwirq];
} else {
/*
* Try to find the next non exclusive output
*/
outirq = router->nextout;
while (router->exclusive[outirq]) {
outirq++;
if (outirq == NR_OUTPUTS)
outirq = 0;
if (outirq == router->nextout)
return -ENOSPC;
}
router->nextout = outirq + 1;
if (router->nextout == NR_OUTPUTS)
router->nextout = 0;
}

/* Do the magic hardware initialization .... */

Hmm?

> 3) The file is called 'irq-tango_v4.c' but I think it should match the
> compatible string, so I was thinking of renaming:
> irq-tango_v4.c => irq-sigma_smp_irqrouter.c
> irq-tango_v4.h => irq-sigma_smp_irqrouter.h
>
> What do you think?

That's the least of my worries as long as the filename is unique and can be
associated to the hardware in a sensible way.

> 4) Do I have to do something more to handle the affinity stuff?

Well, I have no idea how your hardware works, but I assume that the affinity
is solely handled by the GIC. So delegating the affinity setting to the GIC is
all you can do.

> 6) More of a theoretical question:
> I have to #include <dt-bindings/interrupt-controller/irq-tango_v4.h> from
> code that is not in the Linux tree. That's fine for now, but if in the future
> there are other irq controllers and another header is required, how can the
> code know which header to #include ?
> Are we supposed to #include all of them, and then somehow detect which module
> is actually active and act accordingly? If so, can we detect which module
> matched and probed correctly to know which controller version we need to
> talk to?

-ENOPARSE

> What the driver does is:
> - creates a domain hierarchy attached to its parent domain (the GIC)
> - the callbacks for this domain hierarchy will be used to deal with IRQs
> "directly" routed to the GIC (i.e.: exclusive IRQ lines routed to the GIC
> and not shared).
> In this case the GIC dispatches the interrupts.
> - create linear domains attached to this driver's node.
> - the callbacks for these linear domains will be used to deal with shared
> IRQs (they share the routing to one of the GIC inputs).
> In this case this driver dispatches the interrupts after reading the flags.
>
> It is such concept, and then its implementation, that I would need some more
> advise with, because, as stated in the questions above, it still does not
> feels right, especially question 2.

The non shared case is merily an optimization of the shared case, but I
recommend to omit it for now and just deal with a very straight forward
implementation of the recommendation I outlined above. Making the non-shared
case special is simple once you got the above right.

> >> +#if 0
> >
> > Don't even think about posting code with "#if 0" or such in it.
> >
>
> The idea behind these macros was:
> - to easily enable/disable them.
> - to be more consistent with other code written at Sigma: this is a
> Sigma-only driver.

I don't care about Sigmas coding habits. This is kernel code and not
Sigma-only playground.

> > pr_debug() is there for a reason.
>
> Yes, but doesn't pr_debug() requires changes to the Makefile?

# git grep pr_debug Documentation/

> >
> >> +#if 0
> >> +#define SHORT_OR_FULL_NAME full_name
> >> +#else
> >> +#define SHORT_OR_FULL_NAME name
> >> +#endif
> >
> > Use one and be done with it. Really.
>
> By making the choice explicit, the person reading/debugging is aware of the possibilities.
> Picking "full_name" or "name" would likely prevent the person from knowing it has other debug options.
>

This does prevent absolutely nothing. Random #if 0 all over the place is a
nightmare and nothing else. And what do you win by printing one or the other
name? Absolutely nothing, really. Again, it might be Sigmas way to obfuscate
code and at the same time claiming that it makes it better to read/debug, but
that choice of coding style/habits have absolutely no place in proper kernel
code.

> >> +
> >> +#define NODE_NAME(__node__) (__node__ ? __node__->SHORT_OR_FULL_NAME : \
> >> + "<no-node>")
> >
> > Sigh. pr_xxx("%s", NULL) prints <null>. That's really sufficient for your
> > debug/error handling.
>
> The code is checking for __node__ to be non-NULL before dereferencing it.

pr_xxx("%s", node->name ? node->name: NULL)

What's wrong with explicitely writing stuff into the code where people can see
it instead of obfuscating everything with gazillion of macros.

> >> +static inline u32 tango_readl(struct tango_irqrouter *irqrouter,
> >> + int reg)
> >> +{
> >> + u32 val = readl_relaxed(irqrouter->base + reg);
> >> + /*DBGLOG("r[0x%08x + 0x%08x = 0x%08x] = 0x%08x\n",
> >> + irqrouter->base, reg, irqrouter->base + reg, val);*/
> >> + return val;
> >
> > Please get rid of this debug nonsense.
>
> Ok, although I can guarantee it was really useful to determine and prove
> that the HW documentation of the module was wrong...

That's fine. But that commented out stuff has no place in proper code. Use
pr_debug() if you really think that this information is valuable for debugging
beyond the initial experimentation. If it's of no use for the daily trouble
shooting with users, get rid of it.

> >> +static inline void tango_set_swirq_enable(struct tango_irqrouter *irqrouter,
> >> + int swirq,
> >> + bool enable)
> >> +{
> >> + u32 offset = SWIRQ_ENABLE;
> >
> > If you name that varible 'reg' then it becomes obvious what it does.
>
> Well, it was to add a sort of "consistency" between the code for SW IRQs and
> HW IRQs.

You try to force different things into the same implementation, which is
always a bad sign.

> >
>> + u32 value = tango_readl(irqrouter, offset);
> >> + u32 swirq_bit_index = swirq % SWIRQ_COUNT;
> >> +
> >> +#if 1
> >> + DBGLOG("%smask swirq(in) %d : current regvalue 0x%x\n",
> >> + enable ? "un":"",
> >> + swirq, value);
> >> +#endif
> >> +
> >> + if (enable) {
> >> + /* unmask swirq */
> >
> > The whole code lacks comments, but here you document the obvious ....
>
> I will remove the comments.

Instead of just removing the comments, please add comments where the stuff is
non obvious.

> >> +static inline void tango_set_swirq_inversion(struct tango_irqrouter *irqrouter,
> >> + int swirq,
> >> + bool invert)
> >> +{
> >> +
> >> + DBGLOG("swirq(in) %d %s inverted\n", swirq, invert ? "":"not");
> >> +
> >> + if (invert)
> >> + DBGERR("SW IRQs cannot be inverted!\n");
> >
> > Groan.
>
> I rather keep this to catch typos in code requesting SW IRQs.

Come on. Is that coding kindergarden or what?

> >> + if (irq_out < 0) {
> >
> > This logic is utterly confusing, really. What's a negative interrupt number?
>
> It is a sort of "magic value" to disable the IRQ and the routing.

And of course completely undocumented. Again you try to implement stuff which
should be seperate into a single implementation and end up with an unreadable
and therefor unmaintainable mess.

> >> + } else
> >
> > } else {
> >
> > please
>
> I think I'm confused, we need to use {} for single statements on "else" blocks,
> but not on "for" blocks?

for (i = 0; i < MAX; i++)
do_stuff(i);

for (i = 0; i < MAX; i++) {
do_stuff(i);
do_more_stuff(i);
}

if (bla < MAX)
do_stuff(bla);
else
do_something_else(bla);

if (bla < MAX) {
do_stuff(bla);
do_more_stuff(bla);
} else {
do_something_else(bla);
do_more_stuff(bla);
}

if (bla < MAX) {
do_stuff(bla);
do_more_stuff(bla);
} else {
do_something_else(bla);
}

if (bla < MAX) {
do_stuff(bla);
} else {
do_something_else(bla);
do_more_stuff(bla);
}

Now compare that to:

if (bla < MAX)
do_stuff(bla);
else {
do_something_else(bla);
do_more_stuff(bla);
}

It's harder to parse simply because the 'else {' is assymetric. Ditto for

if (bla < MAX) {
do_stuff(bla);
do_more_stuff(bla);
} else
do_something_else(bla);

Ok?

> >> + if (irq_out < 0) {
> >> + tango_set_irq_enable(irqrouter,
> >> + irq_in,
> >> + 0);
> >
> > This fits in a single line.
>
> Ok.
> Actually, I was wondering if for consistency it wouldn't be better if all
> function calls had one argument per line?

Err no. You waste precious vertical space and therefor context.

> >> +static inline int tango_set_irq_enable(struct tango_irqrouter *irqrouter,
> >> + int irq,
> >> + bool enable)
> >> +{
> >> + if (irq >= (irqrouter->input_count + irqrouter->swirq_count))
> >> + return -EINVAL;
> >
> > How can that happen? Paranoia programming?
>
> No, actually it can happen, that's the reason for question 2 of my RFC:
>
> ----
> 2) In order to get a virq mapping for the domains associated with the outputs
> (the outputs connected to the GIC) I request a sort of "Fake HW IRQ"
> See irq-tango_v4.c:1400
>
> /* To request a virq we need a HW IRQ, use a "Fake HW IRQ" */
> hwirq = index + irqrouter->input_count + irqrouter->swirq_count;
>
> This feels a bit like a hack, so suggestions are welcome.

It is a hack. See above.

> > This can be written way simpler.
> >
> > if (irq >= irqrouter->input_count)
> > irq -= irqrouter->input_count;
> > tango_set_hwirq_enable(irqrouter, irq, enable);
> >
> > Hmm?
>
> Ok, but is it ok to modify function parameters?
> I thought it was not recommended.
> Indeed, some people may assume they are constants.

They are only constant when they have a const qualifier in the function
prototype. Random assumptions of random people are irrelevant. You can use an
intermediate local variable and do the above if that makes you feel
better. That still will be way better readable and understandable code.

> >> +
> >> + if (err < 0) {
> >> + DBGWARN("Failed to setup IRQ %d polarity\n", hwirq_in);
> >> + return err;
> >> + }
> >> +
> >> + switch (type & IRQ_TYPE_SENSE_MASK) {
> >> + case IRQ_TYPE_EDGE_RISING:
> >> + case IRQ_TYPE_EDGE_FALLING:
> >> + DBGERR("Does not support edge triggers\n");
> >
> > This really sucks. In fact you panic here while the code suggests that you
> > print a DEBUG error and continue .....
> >
>
> If I use panic() here, do I still have to use 'break' as well?

It does not matter much.

> >> + default:
> >> + DBGWARN("Invalid trigger mode 0x%x for hwirq(in) %d\n",
> >> + type, hwirq_in);
> >
> > So here you continue with an error code, but above for EDGE you panic. I
> > really don't get that logic.
>
> Ok.
> Do you suggest I just make the code panic for any error? I'm fine with that.

I ask for consistent handling. Either you always panic or you try to handle
all errors gracefully. I prefer the latter because if it's not a essential
interrupt then you still have the chance to retrieve debug info.

panic or BUG() are the last resort when there is no way to keep the machine
alive and any action would perhaps cause even more fatal problems. That's
hardly the case when an interrupt cannot be set up.

> >> + if (!out_val)
> >> + return -EINVAL;
> >
> > This is utter crap. Look at the call site:
>
> I think I'm confused, so it is not recommended to check for NULL pointers
> before dereferencing them? Regardless of how the current code calls the
> function?

There is nothing wrong with defensive programming, but just adding checks to
every function bloats the code for no value. If you have a helper with a
single callsite, then such a check is just pointless and only there to make
you feel good or whatever. Global functions which can be called by random code
certainly want to have such checks, but then the callers are also required to
check the return values.

> >
> >> + struct tango_irqrouter_output *irqrouter_output = NULL;
> >> +
> >> + tango_get_output_for_hwirq(irqrouter, hwirq_in, &irqrouter_output);
> >> + if (!irqrouter_output)
> >> + goto handle_parent;
> >
> > So out_val CANNOT be NULL.
>
> You are right, with current code it can't.
> But I thought that it was a good practice to:
> "be lenient with your input and strict with your output"

See above.

> >What's the purpose of your return values if you
> > ignore them?
>
> I can make the function 'void'.

I made you a proposal for a useful replacement of that thing. See below.

> > This whole thing can be condensed to:
> >
> > static struct tango_irqrouter_output *
> > tango_get_output_for_hwirq(struct tango_irqrouter *irqr, int hwirq_in)
> > {
> > struct tango_irqrouter_output *rout;
> > int i, j;
> >
> > /* Scan the outputs for a matching hwirq number */
> > for (i = 0, rout = irqr->output; i < irqr->output_count; i++, rout++) {
> > for (j = 0; j < rout->shared_count; j++) {
> > if (rout->shared_irqs[j] == hwirq_in)
> > return rout;
> > }
> > }
> > return NULL;
> > }
> >
> > And the call site would become:
> >
> > struct tango_irqrouter_output *rout;
> >
> > rout = tango_get_output_for_hwirq(irqrouter, hwirq_in);
> > if (!rout)
> > goto handle_parent;
> >
> > Hmm?
>
> Ok.

> >> +static void tango_irqchip_mask_irq(struct irq_data *data)
> >> +{
> >> + struct irq_domain *domain = irq_data_get_irq_chip_data(data);
> >> + struct tango_irqrouter_output *irqrouter_output = domain->host_data;
> >> + struct tango_irqrouter *irqrouter = irqrouter_output->context;
> >> + int hwirq_in = (int)data->hwirq;
> >
> > Huch? What's that silly typecast for? Why can't you use u32 for hwirq_in?
>
> data->hwirq is of type irq_hw_number_t (unsigned long), I don't know if that
> can change in the future.

And because something might change in the future you slap a type cast on
it. Which will nicely prevent a compiler warning when the conversion after the
change is crap.

No. We only do type casts when there is a real reason and not something like
'might change in the future'.

Guess what happens if we change data->hwirq to be a pointer? Your typecast
still works ....

> So do you think it is safe to use u32 for data->hwirq (and all other "irq"
> occurrences)?

If your hwirq numbers are not going to exceed 2^32 ....

> >
> > Aside of that mask_val is a clear misnomer.
>
> Actually, this code comes from exynos-combiner.c and is part of the reason for

Copy and paste is not an excuse for bad code.

> >> +
> >> + irq_status[i] = HANDLE_EN_AND_INV_MASKS(irq_status[i], i);
> >> + status = tango_dispatch_irqs(domain, desc, irq_status[i], i*32);
> >> + if (status & irq_status[i])
> >

> > Can you pretty please explain how this can happen? It can't. Because you
> > clear every single bit of status in tango_dispatch_irqs(). See above.
>
> You are right that with the current code of tango_dispatch_irqs() it cannot
> happen, but that's more of a typo. I wanted to have a way to print something
> from this driver in the event of unhandled IRQs.

That's the fricking wrong place. If the dispatcher cannot handle an irq then
the dispatcher should issue the printout, not some random call site.

> >> +/**
> >> + * tango_irqdomain_map - route a hwirq(in) to a hwirq(out).
> >> + * NOTE: The hwirq(out) must have been already allocated and enabled on
> >> + * the parent controller.
> >
> > Please put notes AFTER the argument descriptions
>
> The note refers to the title:
> "tango_irqdomain_map - route a hwirq(in) to a hwirq(out)."

That does not make the comment a valid kernel doc comment. Format is:

/**
* function_name - Short function description
* @arg1: Description of arg1
* @argument2: Description of argument2
*
* Long description of the function including return values.
*/

Hint. You can build the kerneldoc stuff. It will tell you what's wrong.

> >> + * @domain: a registered domain
> >> + * @fwspec: an IRQ specification. This callback is used to translate the
> >> + * parameters given as irq_fwspec into a HW IRQ and Type values.
> >
> > fwspec is a callback?
>
> No, the sentence refers to the callback below (tango_irqdomain_translate) that is
> being described by this comment.
> I can reformulate or remove the comment.

Please do not remove useful comments. Make them proper.

> >> + err = tango_parse_fwspec(domain,
> >> + fwspec,
> >> + &domain_id,
> >> + &irq,
> >> + &type);
> >
> > Please use the full 80 characters. Your patch does not become more valuable
> > when you inflate the line count. It's more valuable when it is easy to read
> > and understand.
>
> Ok.
>
> As a side note, does the 80 characters limit still makes sense? I mean, I would
> imagine that anybody now is using wide screens capable of at least double that
> amount.

That's horrible. Did you ever think about why stuff is printed in columns or
the width is restricted?

There is a very simple reason: Your eyes cannot parse wide printed lines fast
without moving the focus while you can scan a limited width in one go.

> This is more of a relic when I did not had access to the 'select' callback.
> (I was working on 4.4)

Please remove relics. They are just a burden for review and they will confuse
the hell out of you 3 month from now.

Thanks,

tglx