Re: [patch v2 1/2] mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks

From: Michal Hocko
Date: Sat Dec 16 2017 - 06:33:44 EST


On Sat 16-12-17 15:21:51, Tetsuo Handa wrote:
> On 2017/12/16 1:25, Michal Hocko wrote:
> >> struct mmu_notifier_ops {
> >> + /*
> >> + * Flags to specify behavior of callbacks for this MMU notifier.
> >> + * Used to determine which context an operation may be called.
> >> + *
> >> + * MMU_INVALIDATE_DOES_NOT_BLOCK: invalidate_{start,end} does not
> >> + * block
> >> + */
> >> + int flags;
> >
> > This should be more specific IMHO. What do you think about the following
> > wording?
> >
> > invalidate_{start,end,range} doesn't block on any locks which depend
> > directly or indirectly (via lock chain or resources e.g. worker context)
> > on a memory allocation.
>
> I disagree. It needlessly complicates validating the correctness.

But it makes it clear what is the actual semantic.

> What if the invalidate_{start,end} calls schedule_timeout_idle(10 * HZ) ?

Let's talk seriously about a real code. Any mmu notifier doing this is
just crazy and should be fixed.

> schedule_timeout_idle() will not block on any locks which depend directly or
> indirectly on a memory allocation, but we are already blocking other memory
> allocating threads at mutex_trylock(&oom_lock) in __alloc_pages_may_oom().

Then the reaper will block and progress would be slower.

> This is essentially same with "sleeping forever due to schedule_timeout_killable(1) by
> SCHED_IDLE thread with oom_lock held" versus "looping due to mutex_trylock(&oom_lock)
> by all other allocating threads" lockup problem. The OOM reaper does not want to get
> blocked for so long.

Yes, it absolutely doesn't want to do that. MMu notifiers should be
reasonable because they are called from performance sensitive call
paths.

--
Michal Hocko
SUSE Labs