Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
From: peterz
Date: Thu Sep 24 2020 - 04:28:24 EST
On Wed, Sep 23, 2020 at 11:52:51AM -0400, Steven Rostedt wrote:
> On Wed, 23 Sep 2020 10:40:32 +0200
> peterz@xxxxxxxxxxxxx wrote:
>
> > However, with migrate_disable() we can have each task preempted in a
> > migrate_disable() region, worse we can stack them all on the _same_ CPU
> > (super ridiculous odds, sure). And then we end up only able to run one
> > task, with the rest of the CPUs picking their nose.
>
> What if we just made migrate_disable() a local_lock() available for !RT?
Can't, neiter migrate_disable() nor migrate_enable() are allowed to
block -- which is what makes their implementation so 'interesting'.
> This should lower the SHC in theory, if you can't have stacked migrate
> disables on the same CPU.
See this email in that other thread (you're on Cc too IIRC):
https://lkml.kernel.org/r/20200923170809.GY1362448@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I think that is we 'frob' the balance PULL, we'll end up with something
similar.
Whichever way around we turn this thing, the migrate_disable() runtime
(we'll have to add a tracer for that), will be an interference term on
the lower priority task, exactly like preempt_disable() would be. We'll
just not exclude a higher priority task from running.
AFAICT; the best case is a single migrate_disable() nesting, where a
higher priority task preempts in a migrate_disable() section -- this is
per design.
When this preempted task becomes elegible to run under the ideal model
(IOW it becomes one of the M highest priority tasks), it might still
have to wait for the preemptee's migrate_disable() section to complete.
Thereby suffering interference in the exact duration of
migrate_disable() section.
Per this argument, the change from preempt_disable() to
migrate_disable() gets us:
- higher priority tasks gain reduced wake-up latency
- lower priority tasks are unchanged and are subject to the exact same
interference term as if the higher priority task were using
preempt_disable().
Since we've already established this term is unbounded, any task but the
highest priority task is basically buggered.
TL;DR, if we get balancing fixed and achieve (near) the optimal case
above, migrate_disable() is an over-all win. But it's provably
non-deterministic as long as the migrate_disable() sections are
non-deterministic.
The reason this all mostly works in practise is (I think) because:
- People care most about the higher prio RT tasks and craft them to
mostly avoid the migrate_disable() infected code.
- The preemption scenario is most pronounced at higher utilization
scenarios, and I suspect this is fairly rare to begin with.
- And while many of these migrate_disable() regions are unbound in
theory, in practise they're often fairly reasonable.
So my current todo list is:
- Change RT PULL
- Change DL PULL
- Add migrate_disable() tracer; exactly like preempt/irqoff, except
measuring task-runtime instead of cpu-time.
- Add a mode that measures actual interference.
- Add a traceevent to detect preemption in migrate_disable().
And then I suppose I should twist Daniel's arm to update his model to
include these scenarios and numbers.