Re: [PATCH v3 05/13] genirq: Let purely flow-masked ONESHOT irqs through unmask_threaded_irq()

From: Valentin Schneider
Date: Thu Aug 12 2021 - 17:38:24 EST


On 12/08/21 15:45, Marc Zyngier wrote:
> On Thu, 12 Aug 2021 14:36:35 +0100,
> Valentin Schneider <valentin.schneider@xxxxxxx> wrote:
>>
>> On 12/08/21 08:26, Marc Zyngier wrote:
>> > On Tue, 29 Jun 2021 13:50:02 +0100,
>> > Valentin Schneider <valentin.schneider@xxxxxxx> wrote:
>> >> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
>> >> index ef30b4762947..e6d6d32ddcbc 100644
>> >> --- a/kernel/irq/manage.c
>> >> +++ b/kernel/irq/manage.c
>> >> @@ -1107,7 +1107,7 @@ static void irq_finalize_oneshot(struct irq_desc *desc,
>> >> desc->threads_oneshot &= ~action->thread_mask;
>> >>
>> >> if (!desc->threads_oneshot && !irqd_irq_disabled(&desc->irq_data) &&
>> >> - irqd_irq_masked(&desc->irq_data))
>> >> + (irqd_irq_masked(&desc->irq_data) | irqd_irq_flow_masked(&desc->irq_data)))
>> >> unmask_threaded_irq(desc);
>> >
>> > The bitwise OR looks pretty odd. It is probably fine given that both
>> > side of the expression are bool, but still. I can fix this locally.
>> >
>>
>> Thomas suggested that back in v1:
>>
>> https://lore.kernel.org/lkml/87v98v4lan.ffs@xxxxxxxxxxxxxxxxxxxxxxx/
>>
>> I did look at the (arm64) disassembly diff back then and was convinced by
>> what I saw, though I'd have to go do that again as I can't remember much
>> else.
>
> Ah, fair enough.
>

Either I didn't have my glasses on or had a different output back then, but
I'm not so convinced anymore... (same result on both Ubuntu GCC 9.3.0 and
10.2 GCC release from Arm):


Logical OR:

8f8: b9400020 ldr w0, [x1]
8fc: 3787fea0 tbnz w0, #16, 8d0 <irq_finalize_oneshot.part.0+0x60>
900: 37880040 tbnz w0, #17, 908 <irq_finalize_oneshot.part.0+0x98>
904: 36fffe60 tbz w0, #31, 8d0 <irq_finalize_oneshot.part.0+0x60>
908: aa1303e0 mov x0, x19
90c: 94000000 bl 0 <unmask_threaded_irq>

Bitwise OR (aka the patch):

8f8: b9400020 ldr w0, [x1]
8fc: 3787fea0 tbnz w0, #16, 8d0 <irq_finalize_oneshot.part.0+0x60>
900: f26f001f tst x0, #0x20000
904: 7a400801 ccmp w0, #0x0, #0x1, eq // eq = none
908: 54fffe4a b.ge 8d0 <irq_finalize_oneshot.part.0+0x60> // b.tcont
90c: aa1303e0 mov x0, x19
910: 94000000 bl 0 <unmask_threaded_irq>

If I get this right...

- TST sets the Z condition flag if bit 17 (masked) isn't set
- CCMP sets the condition flags to
- the same as SUBS(flags, 0) if bit 17 wasn't set
- NZCV=0001 otherwise
- B.GE branches if N==V

Soooo

- if we have bit 17 set, NZCV=0001, B.GE doesn't branch
- if we don't have bit 17 but bit 31 (flow-masked), NZCV=1000 because
this is signed 32-bit, so having bit 31 set makes the result of
SUBS(flags, 0) negative, B.GE doesn't branch
- if we have neither, NZCV=0XX0, B.GE branches

So this does appear to do the right thing, at the cost of an extra
instruction and a profound sense of dread to whoever stares at the
disassembly. I guess it does save us a branch which could be
mispredicted...

> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.