Re: [PATCH v1 1/3] x86/tdx: Add TDX Guest event notify interrupt support

From: Sathyanarayanan Kuppuswamy
Date: Mon Mar 27 2023 - 22:50:41 EST


Hi Kai,

On 3/27/23 7:38 PM, Huang, Kai wrote:
>> +/* Reserve an IRQ from x86_vector_domain for TD event notification */
>> +static int __init tdx_event_irq_init(void)
>> +{
>> + struct irq_alloc_info info;
>> + cpumask_t saved_cpumask;
>> + struct irq_cfg *cfg;
>> + int cpu, irq;
>> +
>> + if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
>> + return 0;
>> +
>> + init_irq_alloc_info(&info, NULL);
>> +
>> + /*
>> + * Event notification vector will be delivered to the CPU
>> + * in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested.
>> + * So set the IRQ affinity to the current CPU.
>> + */
>> + cpu = get_cpu();
>> + cpumask_copy(&saved_cpumask, current->cpus_ptr);
>> + info.mask = cpumask_of(cpu);
>> + put_cpu();
> The 'saved_cpumask' related code is ugly. If you move put_cpu() to the end of
> this function, I think you can remove all related code:
>
> cpu = get_cpu();
>
> /*
> * Set @info->mask to local cpu to make sure a valid vector is
> * pre-allocated when TDX event notification IRQ is allocated
> * from x86_vector_domain.
> */
> init_irq_alloc_info(&info, cpumask_of(cpu));
>
> // rest staff: request_irq(), hypercall ...
>
> put_cpu();
>

init_irq_alloc_info() is a sleeping function. Since get_cpu() disables
preemption, we cannot call sleeping function after it. Initially, I
have implemented it like you have mentioned. However, I discovered the
following error.

[    2.400755] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
[    2.404664] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
[    2.408671] preempt_count: 1, expected: 0
[    2.412650] RCU nest depth: 0, expected: 0
[    2.412666] no locks held by swapper/0/1.
[    2.416650] Preemption disabled at:
[    2.416650] [<ffffffff83b8089f>] tdx_arch_init+0x38/0x117
[    2.420670] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc4-00117-g672ca073d9f9-dirty #2527
[    2.424650] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[    2.424650] Call Trace:
[    2.424650]  <TASK>
[    2.424650]  dump_stack_lvl+0x6a/0x86
[    2.424650]  __might_resched.cold+0xf4/0x12f
[    2.424650]  __mutex_lock+0x50/0x810
[    2.424650]  ? lock_is_held_type+0xd8/0x130
[    2.424650]  ? __irq_alloc_descs+0xcf/0x310
[    2.424650]  ? find_held_lock+0x2b/0x80
[    2.424650]  ? __irq_alloc_descs+0xcf/0x310
[    2.424650]  __irq_alloc_descs+0xcf/0x310
[    2.424650]  irq_domain_alloc_descs.part.0+0x49/0xa0
[    2.424650]  __irq_domain_alloc_irqs+0x2a0/0x4f0
[    2.424650]  ? next_arg+0x129/0x1f0
[    2.424650]  ? tdx_guest_init+0x5b/0x5b
[    2.424650]  tdx_arch_init+0x8e/0x117
[    2.424650]  do_one_initcall+0x137/0x2ec
[    2.424650]  ? rcu_read_lock_sched_held+0x36/0x60
[    2.424650]  kernel_init_freeable+0x1e3/0x241
[    2.424650]  ? rest_init+0x1a0/0x1a0
[    2.424650]  kernel_init+0x17/0x170
[    2.424650]  ? rest_init+0x1a0/0x1a0
[    2.424650]  ret_from_fork+0x1f/0x30
[    2.424650]  </TASK>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer