Re: [PATCH -next v4 06/19] arm64: entry: Move arm64_preempt_schedule_irq() into exit_to_kernel_mode()

From: Jinjie Ruan
Date: Thu Oct 31 2024 - 00:02:26 EST

Next message: ZhengShaobo: "Re: [PATCH] thermal: gov_power_allocator: Granted power set to max when nobody request power"
Previous message: Yan Zhao: "Re: [PATCH v2 08/25] x86/virt/tdx: Add SEAMCALL wrappers for TDX page cache management"
In reply to: Mark Rutland: "Re: [PATCH -next v4 06/19] arm64: entry: Move arm64_preempt_schedule_irq() into exit_to_kernel_mode()"
Next in thread: Jinjie Ruan: "[PATCH -next v4 07/19] arm64: entry: Call arm64_preempt_schedule_irq() only if irqs enabled"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2024/10/29 22:52, Mark Rutland wrote:
> On Fri, Oct 25, 2024 at 06:06:47PM +0800, Jinjie Ruan wrote:
>> Move arm64_preempt_schedule_irq() into exit_to_kernel_mode(), so not
>> only __el1_irq() but also every time when kernel mode irq return,
>> there is a chance to reschedule.
>
> We use exit_to_kernel_mode() for every non-NMI exception return to the
> kernel, not just IRQ returns.

Yes, it it not only irq but other non-NMI exception, will update it.

>
>> As Mark pointed out, this change will have the following key impact:
>>
>> "We'll preempt even without taking a "real" interrupt. That
>> shouldn't result in preemption that wasn't possible before,
>> but it does change the probability of preempting at certain points,
>> and might have a performance impact, so probably warrants a
>> benchmark."
>
> For anyone following along at home, I said that at:
>
> https://lore.kernel.org/linux-arm-kernel/ZxejvAmccYMTa4P1@J2N7QTR9R3/
>
> ... and there I specifically said:

Thank you!

This one and the next patch will be merged as you suggested.

I would have thought it would have been clearer to put it in
__exit_to_kernel_mode() and move it to interrupt enabled block in two steps.

>
>> I's suggest you first write a patch to align arm64's entry code with the
>> generic code, by removing the call to arm64_preempt_schedule_irq() from
>> __el1_irq(), and adding a call to arm64_preempt_schedule_irq() in
>> __exit_to_kernel_mode(), e.g.
>>
>> | static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
>> | {
>> | ...
>> | if (interrupts_enabled(regs)) {
>> | ...
>> | if (regs->exit_rcu) {
>> | ...
>> | }
>> | ...
>> | arm64_preempt_schedule_irq();
>> | ...
>> | } else {
>> | ...
>> | }
>> | }
>
> [...]
>
>> +#ifdef CONFIG_PREEMPT_DYNAMIC
>> +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
>> +#define need_irq_preemption() \
>> + (static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
>> +#else
>> +#define need_irq_preemption() (IS_ENABLED(CONFIG_PREEMPTION))
>> +#endif
>> +
>> +static void __sched arm64_preempt_schedule_irq(void)
>> +{
>> + if (!need_irq_preemption())
>> + return;
>> +
>> + /*
>> + * Note: thread_info::preempt_count includes both thread_info::count
>> + * and thread_info::need_resched, and is not equivalent to
>> + * preempt_count().
>> + */
>> + if (READ_ONCE(current_thread_info()->preempt_count) != 0)
>> + return;
>> +
>> + /*
>> + * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
>> + * priority masking is used the GIC irqchip driver will clear DAIF.IF
>> + * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
>> + * DAIF we must have handled an NMI, so skip preemption.
>> + */
>> + if (system_uses_irq_prio_masking() && read_sysreg(daif))
>> + return;
>> +
>> + /*
>> + * Preempting a task from an IRQ means we leave copies of PSTATE
>> + * on the stack. cpufeature's enable calls may modify PSTATE, but
>> + * resuming one of these preempted tasks would undo those changes.
>> + *
>> + * Only allow a task to be preempted once cpufeatures have been
>> + * enabled.
>> + */
>> + if (system_capabilities_finalized())
>> + preempt_schedule_irq();
>> +}
>> +
>> /*
>> * Handle IRQ/context state management when exiting to kernel mode.
>> * After this function returns it is not safe to call regular kernel code,
>> @@ -72,6 +114,8 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>> static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
>> irqentry_state_t state)
>> {
>> + arm64_preempt_schedule_irq();
>
> This is broken; exit_to_kernel_mode() is called for any non-NMI return
> excpetion return to the kernel, and this doesn't check that interrupts
> were enabled in the context the exception was taken from.
>
> This will preempt in cases where we should not, e.g. if we WARN() in a section with
> IRQs disabled.
>
> Mark.
>

Next message: ZhengShaobo: "Re: [PATCH] thermal: gov_power_allocator: Granted power set to max when nobody request power"
Previous message: Yan Zhao: "Re: [PATCH v2 08/25] x86/virt/tdx: Add SEAMCALL wrappers for TDX page cache management"
In reply to: Mark Rutland: "Re: [PATCH -next v4 06/19] arm64: entry: Move arm64_preempt_schedule_irq() into exit_to_kernel_mode()"
Next in thread: Jinjie Ruan: "[PATCH -next v4 07/19] arm64: entry: Call arm64_preempt_schedule_irq() only if irqs enabled"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]