Re: [PATCH 1/5] mm: Check if mmu notifier callbacks are allowed to fail

From: Ralph Campbell
Date: Wed Aug 14 2019 - 19:35:01 EST



On 8/14/19 3:14 PM, Andrew Morton wrote:
On Wed, 14 Aug 2019 22:20:23 +0200 Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:

Just a bit of paranoia, since if we start pushing this deep into
callchains it's hard to spot all places where an mmu notifier
implementation might fail when it's not allowed to.

Inspired by some confusion we had discussing i915 mmu notifiers and
whether we could use the newly-introduced return value to handle some
corner cases. Until we realized that these are only for when a task
has been killed by the oom reaper.

An alternative approach would be to split the callback into two
versions, one with the int return value, and the other with void
return value like in older kernels. But that's a lot more churn for
fairly little gain I think.

Summary from the m-l discussion on why we want something at warning
level: This allows automated tooling in CI to catch bugs without
humans having to look at everything. If we just upgrade the existing
pr_info to a pr_warn, then we'll have false positives. And as-is, no
one will ever spot the problem since it's lost in the massive amounts
of overall dmesg noise.

...

--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -179,6 +179,8 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
pr_info("%pS callback failed with %d in %sblockable context.\n",
mn->ops->invalidate_range_start, _ret,
!mmu_notifier_range_blockable(range) ? "non-" : "");
+ WARN_ON(mmu_notifier_range_blockable(range) ||
+ ret != -EAGAIN);
ret = _ret;
}
}

A problem with WARN_ON(a || b) is that if it triggers, we don't know
whether it was because of a or because of b. Or both. So I'd suggest

WARN_ON(a);
WARN_ON(b);


This won't quite work. It is OK to have mmu_notifier_range_blockable(range) be true or false.
sync_cpu_device_pagetables() shouldn't return
-EAGAIN unless blockable is true.