Re: [PATCH] fixup: [PATCH v12 2/3] sched: Move task_mm_cid_work to mm work_struct

From: Gabriele Monaco
Date: Thu Apr 10 2025 - 10:38:07 EST




On Thu, 2025-04-10 at 10:04 -0400, Mathieu Desnoyers wrote:
> On 2025-04-10 08:50, Gabriele Monaco wrote:
> > Thanks both for the comments, I tried to implement what Mathieu
> > suggested. This patch applies directly on 2/3 but I'm sending it
> > here
> > first to get feedback.
> >
> > Essentially, I refactored a bit to avoid the need to add more
> > dependencies to rseq, the rseq_tick is now called task_tick_mm_cid
> > (as
> > before the series) and it does the two things you mentioned:
> >   * A) trigger the mm_cid recompaction
> >   * B) trigger an update of the task's rseq->mm_cid field at some
> > point
> >     after recompaction, so it can get a mm_cid value closer to 0.
> >
> > Now, A occurs only after the scan time elapsed, which means it
> > could
> > potentially run multiple times in case the work is not scheduled
> > before
> > the next tick, I'm not sure adding more checks to make sure it
> > happens once and only once really makes sense here.
>
> The scan is gated by two checks now:
>
> > + if (t->mm && rtime > RSEQ_UNPREEMPTED_THRESHOLD) {
> > + if (mm_cid_needs_scan(t->mm))
>
> And likewise for the periodic check for preemption:
>

Alright, I could add another flag here to prevent re-scheduling,

> > + if (t->mm && rtime > RSEQ_UNPREEMPTED_THRESHOLD) {
> [...]
> > + else if (time_after(jiffies, t->last_rseq_preempt
> > +
> > +      
> > msecs_to_jiffies(MM_CID_SCAN_DELAY))) {
>
> Those second levels of time checks would prevent adding significant
> overhead on every tick after the threshold is reached.
>

But I'm not sure what to do here: in my understanding, this second
branch can only run once for task t and may run despite the previous
path was skipped (let's assume two long running threads share the mm,
the first thread schedules the work and it completes before the second
threads meets a qualifying tick).

> >
> > B is occurring after the work updates the last scan time, so we are
> > in a
> > condition where the runtime is above threshold but the (next) scan
> > time
> > did not expire yet.
> > I tried to account for multiple threads updating the mm_cid (not
> > necessarily the long running one, or in case more are long
> > running), for
> > this I'm tracking the last time we updated the mm_cid, if that
> > occurred
> > before the last mm_cid scan, we need to update (and preempt).
> >
> > Does this make sense to you?
>
> It makes sense. Note that it adds overhead to rseq_preempt() (a store
> to t->last_rseq_preempt), which is a fast path. I don't know if we
> should care.

Well, I'm trying to track the last time a reset occurred, that happens
rather in mm_cid_get, which doesn't look like a fast path to me.
I could then rename it to last_rseq_reset since it won't be related to
preemption.

>
> Also part of task_tick_mm_cid could be moved to a helper, e.g.:
>
> static
> void task_reset_mm_cid(struct rq *rq, struct task_struct *t)
> {
>          int old_cid = t->mm_cid;
>         
>          if (!t->mm_cid_active)
>                  return;
>          mm_cid_snapshot_time(rq, t->mm);
>          mm_cid_put_lazy(t);
>          t->last_mm_cid = t->mm_cid = mm_cid_get(rq, t, t->mm);
>          if (old_cid == t->mm_cid)
>                  return;
>          rseq_preempt(t);
> }

Yeah good point

Thanks,
Gabriele