Re: [PATCH v2 1/2] powerpc: Add preempt lazy support

From: Shrikanth Hegde
Date: Sun Dec 01 2024 - 14:30:14 EST




On 11/26/24 16:23, Christophe Leroy wrote:


Le 16/11/2024 à 20:23, Shrikanth Hegde a écrit :
Define preempt lazy bit for Powerpc. Use bit 9 which is free and within
16 bit range of NEED_RESCHED, so compiler can issue single andi.

Since Powerpc doesn't use the generic entry/exit, add lazy check at exit
to user. CONFIG_PREEMPTION is defined for lazy/full/rt so use it for
return to kernel.

FWIW, there is work in progress on using generic entry/exit for powerpc, if you can help testing it that can help, see https:// patchwork.ozlabs.org/project/linuxppc-dev/patch/ F0AE0A4571CE3126+20241111031934.1579-2-luming.yu@xxxxxxxxxxxx/


I gave that series a try. After a lot of manual patching on tip tree and removal of multiple definition of regs_irqs_disabled, i was able to compile and boot.

I am skimming through the series, but as far as i understand from the comments, it needs to be redesigned. I probably get it why. I will go through it in more detail to figure out how to do it better. I believe the changes have to stem from interrupt_64.S

As far as the bits of this patch with that patch concerned, it is with NEED_RESCHED bits. There atleast couple of major issues in that patch series that are wrong.

1. It only tries to move exit to user to generic. exit to kernel is not. there is generic irqentry_exit that handles it for generic code. powerpc exit to kernel still there.

2. Even for exit to user, it ends up calling exit_to_user_mode_loop twice for the same syscall. that seems wrong. once in interrupt_exit_user_prepare_main and once through syscall_exit_to_user_mode in syscall_exit_prepare.


Christophe


So I guess, when we do eventually if move, this checks would be removed at that point along with rest of the code.



Ran a few benchmarks and db workload on Power10. Performance is close to
preempt=none/voluntary.
Since Powerpc systems can have large core count and large memory,
preempt lazy is going to be helpful in avoiding soft lockup issues.

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
Reviewed-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx>
Signed-off-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxx>
---
  arch/powerpc/Kconfig                   | 1 +
  arch/powerpc/include/asm/thread_info.h | 9 ++++++---
  arch/powerpc/kernel/interrupt.c        | 4 ++--
  3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8094a01974cc..2f625aecf94b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -145,6 +145,7 @@ config PPC
      select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
      select ARCH_HAS_PHYS_TO_DMA
      select ARCH_HAS_PMEM_API
+    select ARCH_HAS_PREEMPT_LAZY
      select ARCH_HAS_PTE_DEVMAP        if PPC_BOOK3S_64
      select ARCH_HAS_PTE_SPECIAL
      select ARCH_HAS_SCALED_CPUTIME        if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/ include/asm/thread_info.h
index 6ebca2996f18..2785c7462ebf 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -103,6 +103,7 @@ void arch_setup_new_exec(void);
  #define TIF_PATCH_PENDING    6    /* pending live patching update */
  #define TIF_SYSCALL_AUDIT    7    /* syscall auditing active */
  #define TIF_SINGLESTEP        8    /* singlestepping active */
+#define TIF_NEED_RESCHED_LAZY    9       /* Scheduler driven lazy preemption */
  #define TIF_SECCOMP        10    /* secure computing */
  #define TIF_RESTOREALL        11    /* Restore all regs (implies NOERROR) */
  #define TIF_NOERROR        12    /* Force successful syscall return */
@@ -122,6 +123,7 @@ void arch_setup_new_exec(void);
  #define _TIF_SYSCALL_TRACE    (1<<TIF_SYSCALL_TRACE)
  #define _TIF_SIGPENDING        (1<<TIF_SIGPENDING)
  #define _TIF_NEED_RESCHED    (1<<TIF_NEED_RESCHED)
+#define _TIF_NEED_RESCHED_LAZY    (1<<TIF_NEED_RESCHED_LAZY)
  #define _TIF_NOTIFY_SIGNAL    (1<<TIF_NOTIFY_SIGNAL)
  #define _TIF_POLLING_NRFLAG    (1<<TIF_POLLING_NRFLAG)
  #define _TIF_32BIT        (1<<TIF_32BIT)
@@ -142,9 +144,10 @@ void arch_setup_new_exec(void);
                   _TIF_SYSCALL_EMU)
  #define _TIF_USER_WORK_MASK    (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
-                 _TIF_NOTIFY_RESUME | _TIF_UPROBE | \
-                 _TIF_RESTORE_TM | _TIF_PATCH_PENDING | \
-                 _TIF_NOTIFY_SIGNAL)
+                 _TIF_NEED_RESCHED_LAZY | _TIF_NOTIFY_RESUME | \
+                 _TIF_UPROBE | _TIF_RESTORE_TM | \
+                 _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL)
+
  #define _TIF_PERSYSCALL_MASK    (_TIF_RESTOREALL|_TIF_NOERROR)
  /* Bits in local_flags */
diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/ interrupt.c
index af62ec974b97..8f4acc55407b 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -185,7 +185,7 @@ interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs)
      ti_flags = read_thread_flags();
      while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) {
          local_irq_enable();
-        if (ti_flags & _TIF_NEED_RESCHED) {



+        if (ti_flags & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) {
              schedule();
          } else {
              /*
@@ -396,7 +396,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs)
          /* Returning to a kernel context with local irqs enabled. */
          WARN_ON_ONCE(!(regs->msr & MSR_EE));
  again:
-        if (IS_ENABLED(CONFIG_PREEMPT)) {
+        if (IS_ENABLED(CONFIG_PREEMPTION)) {
              /* Return to preemptible kernel context */
              if (unlikely(read_thread_flags() & _TIF_NEED_RESCHED)) {
                  if (preempt_count() == 0)