Re: [PATCH] mm: remove unintentional voluntary preemption in get_mmap_lock_carefully

From: Mateusz Guzik
Date: Sun Aug 20 2023 - 21:13:13 EST


On Sun, Aug 20, 2023 at 07:12:16PM +0100, Matthew Wilcox wrote:
> On Sun, Aug 20, 2023 at 12:43:03PM +0200, Mateusz Guzik wrote:
> > Found by checking off-CPU time during kernel build (like so:
> > "offcputime-bpfcc -Ku"), sample backtrace:
> > finish_task_switch.isra.0
> > __schedule
> > __cond_resched
> > lock_mm_and_find_vma
> > do_user_addr_fault
> > exc_page_fault
> > asm_exc_page_fault
> > - sh (4502)
>
> Now I'm awake, this backtrace really surprises me. Do we not check
> need_resched on entry? It seems terribly unlikely that need_resched
> gets set between entry and getting to this point, so I guess we must
> not.
>
> I suggest the version of the patch which puts might_sleep() before the
> mmap_read_trylock() is the right one to apply. It's basically what
> we've done forever, except that now we'll be rescheduling without the
> mmap lock held, which just seems like an overall win.
>

I can't sleep and your response made me curious, is that really safe
here?

As I wrote in another email, the routine is concerned with a case of the
kernel faulting on something it should not have. For a case like that I
find rescheduling to another thread to be most concerning.

That said I think I found a winner -- add need_resched() prior to
trylock.

This adds less work than you would have added with might_sleep (a func
call), still respects the preemption point, dodges exception table
checks in the common case and does not switch away if the there is
anything fishy going on.

Or just do that might_sleep.

I'm really buggering off the subject now.

====

mm: remove unintentional voluntary preemption in get_mmap_lock_carefully

Should the trylock succeed (and thus blocking was avoided), the routine
wants to ensure blocking was still legal to do. However, might_sleep()
used ends up calling __cond_resched() injecting a voluntary preemption
point with the freshly acquired lock.

__might_sleep() instead with the lock, but check for preemption prior to
taking it.

Found by checking off-CPU time during kernel build (like so:
"offcputime-bpfcc -Ku"), sample backtrace:
finish_task_switch.isra.0
__schedule
__cond_resched
lock_mm_and_find_vma
do_user_addr_fault
exc_page_fault
asm_exc_page_fault
- sh (4502)
10

Signed-off-by: Mateusz Guzik <mjguzik@xxxxxxxxx>
---
mm/memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 1ec1ef3418bf..6dac9dbb7b59 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5258,8 +5258,8 @@ EXPORT_SYMBOL_GPL(handle_mm_fault);
static inline bool get_mmap_lock_carefully(struct mm_struct *mm, struct pt_regs *regs)
{
/* Even if this succeeds, make it clear we *might* have slept */
- if (likely(mmap_read_trylock(mm))) {
- might_sleep();
+ if (likely(!need_resched() && mmap_read_trylock(mm))) {
+ __might_sleep(__FILE__, __LINE__);
return true;
}

--
2.39.2