Re: [RFC PATCH] sched/fair: update the vruntime to be max vruntime when yield

From: Vincent Guittot
Date: Wed Mar 01 2023 - 08:30:16 EST


On Wed, 1 Mar 2023 at 12:23, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>
> Hi Xuewen,
>
> On 01/03/2023 09:20, Xuewen Yan wrote:
> > On Wed, Mar 1, 2023 at 4:09 PM Vincent Guittot
> > <vincent.guittot@xxxxxxxxxx> wrote:
> >>
> >> On Wed, 1 Mar 2023 at 08:30, Xuewen Yan <xuewen.yan94@xxxxxxxxx> wrote:
> >>>
> >>> Hi Vincent
> >>>
> >>> I noticed the following patch:
> >>> https://lore.kernel.org/lkml/20230209193107.1432770-1-rkagan@xxxxxxxxx/
> >>> And I notice the V2 had merged to mainline:
> >>> https://lore.kernel.org/all/20230130122216.3555094-1-rkagan@xxxxxxxxx/T/#u
> >>>
> >>> The patch fixed the inversing of the vruntime comparison, and I see
> >>> that in my case, there also are some vruntime is inverted.
> >>> Do you think which patch will work for our scenario? I would be very
> >>> grateful if you could give us some advice.
> >>> I would try this patch in our tree.
> >>
> >> By default use the one that is merged; The difference is mainly a
> >> matter of time range. Also be aware that the case of newly migrated
> >> task is not fully covered by both patches.
> >
> > Okay, Thank you very much!
> >
> >>
> >> This patch fixes a problem with long sleeping entity in the presence
> >> of low weight and always running entities. This doesn't seem to be
> >> aligned with the description of your use case
> >
> > Thanks for the clarification! We would try it first to see whether it
> > could resolve our problem.
>
> Can you not run Vincent's rt-app example on your device and then report
> `cat /sys/kernel/debug/sched/debug` of the CPU?
>
> # rt-app /root/rt-app/cfs_yield.json
>
> # cat /sys/kernel/debug/sched/debug
> ...
> cpu#2
> .nr_running : 4
> ...
> .curr->pid : 2121
> ...
>
> cfs_rq[2]:/autogroup-15
> .exec_clock : 0.000000
> .MIN_vruntime : 32428.281204
> .min_vruntime : 32428.281204
> .max_vruntime : 32434.997784
> ...
> .nr_running : 4
> .h_nr_running : 4
>
> ...
>
> S task PID tree-key switches prio wait-time sum-exec sum-sleep
> -------------------------------------------------------------------------------------------------------------
> S cpuhp/2 22 1304.405864 13 120 0.000000 0.270000 0.000000 0.000000 0 0 /
> S migration/2 23 0.000000 8 0 0.000000 7.460940 0.000000 0.000000 0 0 /
> S ksoftirqd/2 24 137721.092326 46 120 0.000000 1.821880 0.000000 0.000000 0 0 /
> I kworker/2:0H 26 2116.827393 4 100 0.000000 0.057220 0.000000 0.000000 0 0 /
> I kworker/2:1 45 204539.183593 322 120 0.000000 447.975440 0.000000 0.000000 0 0 /
> I kworker/2:3 80 1778.668364 33 120 0.000000 16.237320 0.000000 0.000000 0 0 /
> I kworker/2:1H 239 199388.093936 74 100 0.000000 1.892300 0.000000 0.000000 0 0 /
> R taskA-0 2120 32428.281204 582 120 0.000000 1109.911280 0.000000 0.000000 0 0 /autogroup-15
> >R taskB-1 2121 32430.693304 265 120 0.000000 1103.527660 0.000000 0.000000 0 0 /autogroup-15
> R taskB-2 2122 32432.137084 264 120 0.000000 1105.006760 0.000000 0.000000 0 0 /autogroup-15
> R taskB-3 2123 32434.997784 282 120 0.000000 1115.965120 0.000000 0.000000 0 0 /autogroup-15
>
> ...
>
> Not sure how Vincent's rt-app file looks like exactly but I crafted
> something quick here:

it was quite similar to yours below. I have just stopped to call yield
after few seconds to see if the behavior changed

>
> {
> "tasks" : {
> "taskA" : {
> "cpus" : [2],
> "yield" : "taskA",
> "run" : 1000
> },
> "taskB" : {
> "instance" : 3,
> "cpus" : [2],
> "run" : 1000000
> }
> },
> "global" : {
> "calibration" : 156,
> "default_policy" : "SCHED_OTHER",
> "duration" : 20
> }
> }
>
> [...]