Re: Re: [PATCH 05/15] sched/fair: Implement an EEVDF like policy

From: Abel Wu
Date: Wed Oct 11 2023 - 00:15:12 EST

Next message: Abel Wu: "Re: [PATCH 01/15] sched/fair: Add avg_vruntime"
Previous message: Wenkai: "Re: [PATCH 0/5] watchdog: eiois200_wdt: Add EIO-IS200 Watchdog Driver"
Next in thread: Peter Zijlstra: "Re: Re: [PATCH 05/15] sched/fair: Implement an EEVDF like policy"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

在 9/30/23 5:40 AM, Benjamin Segall Wrote:

Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:

+
+/*
+ * Earliest Eligible Virtual Deadline First
+ *
+ * In order to provide latency guarantees for different request sizes
+ * EEVDF selects the best runnable task from two criteria:
+ *
+ * 1) the task must be eligible (must be owed service)
+ *
+ * 2) from those tasks that meet 1), we select the one
+ * with the earliest virtual deadline.
+ *
+ * We can do this in O(log n) time due to an augmented RB-tree. The
+ * tree keeps the entries sorted on service, but also functions as a
+ * heap based on the deadline by keeping:
+ *
+ * se->min_deadline = min(se->deadline, se->{left,right}->min_deadline)
+ *
+ * Which allows an EDF like search on (sub)trees.
+ */
+static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
+{
+ struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
+ struct sched_entity *curr = cfs_rq->curr;
+ struct sched_entity *best = NULL;
+
+ if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
+ curr = NULL;
+
+ while (node) {
+ struct sched_entity *se = __node_2_se(node);
+
+ /*
+ * If this entity is not eligible, try the left subtree.
+ */
+ if (!entity_eligible(cfs_rq, se)) {
+ node = node->rb_left;
+ continue;
+ }
+
+ /*
+ * If this entity has an earlier deadline than the previous
+ * best, take this one. If it also has the earliest deadline
+ * of its subtree, we're done.
+ */
+ if (!best || deadline_gt(deadline, best, se)) {
+ best = se;
+ if (best->deadline == best->min_deadline)
+ break;
+ }
+
+ /*
+ * If the earlest deadline in this subtree is in the fully
+ * eligible left half of our space, go there.
+ */
+ if (node->rb_left &&
+ __node_2_se(node->rb_left)->min_deadline == se->min_deadline) {
+ node = node->rb_left;
+ continue;
+ }
+
+ node = node->rb_right;
+ }

I believe that this can fail to actually find the earliest eligible
deadline, because the earliest deadline (min_deadline) can be in the
right branch, but that se isn't eligible, and the actual target se is in
the left branch. A trivial 3-se example with the nodes represented by
(vruntime, deadline, min_deadline):

(5,9,7)
/ \
(4,8,8) (6,7,7)

AIUI, here the EEVDF pick should be (4,8,8), but pick_eevdf() will
instead pick (5,9,7), because it goes into the right branch and then
fails eligibility.

I'm not sure how much of a problem this is in practice, either in
frequency or severity, but it probably should be mentioned if it's
an intentional tradeoff.

Assume entity i satisfies (d_i == min_deadline) && (v_i > V), there
must be an eligible entity j with (d_j >= d_i) && (v_j < V). Given
that how deadline is calculated, it can be inferred that:

vslice_i < vslice_j

IOW a more batch-like entity with looser deadline will beat entities
that is more interactive-like even with tighter deadline, only because
the former is eligible while the latter isn't.

With Benjamin's fix, the semantics of 'Earliest Eligible' preserved.
But since all this is about latency rather than fairness, I wonder if
there are cases worthy of breaking the 'eligible' rule.

Thanks & Best,
Abel

Thinking out loud, I think that it would be sufficient to recheck via something like

for_each_sched_entity(best) {
check __node_2_se(best->rb_left)->min_deadline, store in actual_best
}

for the best min_deadline, and then go do a heap lookup in actual_best
to find the se matching that min_deadline.

I think this pass could then be combined with our initial descent for
better cache behavior by keeping track of the best rb_left->min_deadline
each time we take a right branch. We still have to look at up to ~2x the
nodes, but I don't think that's avoidable? I'll expand my quick hack I
used to test my simple case into a something of a stress tester and try
some implementations.

Next message: Abel Wu: "Re: [PATCH 01/15] sched/fair: Add avg_vruntime"
Previous message: Wenkai: "Re: [PATCH 0/5] watchdog: eiois200_wdt: Add EIO-IS200 Watchdog Driver"
Next in thread: Peter Zijlstra: "Re: Re: [PATCH 05/15] sched/fair: Implement an EEVDF like policy"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]