Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts

From: Jürgen Groß
Date: Fri Dec 11 2020 - 05:25:47 EST


On 11.12.20 11:13, Thomas Gleixner wrote:
On Fri, Dec 11 2020 at 07:17, Jürgen Groß wrote:
On 11.12.20 00:20, boris.ostrovsky@xxxxxxxxxx wrote:

On 12/10/20 2:26 PM, Thomas Gleixner wrote:
All event channel setups bind the interrupt on CPU0 or the target CPU for
percpu interrupts and overwrite the affinity mask with the corresponding
cpumask. That does not make sense.

The XEN implementation of irqchip::irq_set_affinity() already picks a
single target CPU out of the affinity mask and the actual target is stored
in the effective CPU mask, so destroying the user chosen affinity mask
which might contain more than one CPU is wrong.

Change the implementation so that the channel is bound to CPU0 at the XEN
level and leave the affinity mask alone. At startup of the interrupt
affinity will be assigned out of the affinity mask and the XEN binding will
be updated.


If that's the case then I wonder whether we need this call at all and instead bind at startup time.

This binding to cpu0 was introduced with commit 97253eeeb792d61ed2
and I have no reason to believe the underlying problem has been
eliminated.

"The kernel-side VCPU binding was not being correctly set for newly
allocated or bound interdomain events. In ARM guests where 2-level
events were used, this would result in no interdomain events being
handled because the kernel-side VCPU masks would all be clear.

x86 guests would work because the irq affinity was set during irq
setup and this would set the correct kernel-side VCPU binding."

I'm not convinced that this is really correctly analyzed because affinity
setting is done at irq startup.

switch (__irq_startup_managed(desc, aff, force)) {
case IRQ_STARTUP_NORMAL:
ret = __irq_startup(desc);
irq_setup_affinity(desc);
break;

which is completely architecture agnostic. So why should this magically
work on x86 and not on ARM if both are using the same XEN irqchip with
the same irqchip callbacks.

I think this might be related to _initial_ cpu binding of events and
changing the binding later. This might be handled differently in the
hypervisor.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature