Re: [PATCH v5 2/2] sched/fair: Introduce SIS_SHORT to wake up short task on current CPU
From: Chen Yu
Date: Sun Feb 19 2023 - 23:55:45 EST
On 2023-02-17 at 10:44:51 +0800, Abel Wu wrote:
> On 2/16/23 11:24 PM, Chen Yu wrote:
> > > The following change greatly reduced the p99lat of Redis service
> > > from 150ms to 0.9ms, at exactly the same throughput (QPS).
> > >
> > > @@ -5763,6 +5787,9 @@ wake_affine_weight(struct sched_domain *sd, struct
> > > task_struct *p,
> > > s64 this_eff_load, prev_eff_load;
> > > unsigned long task_load;
> > >
> > > + if (is_short_task(p))
> > > + return nr_cpumask_bits;
> > > +
> > So above change wants to wake up the short task on its previous
> > CPU if I understand correctly.
>
> Yes.
>
> > > this_eff_load = cpu_load(cpu_rq(this_cpu));
> > >
> > > if (sync) {
> > >
> > > I know that 'short' tasks are not necessarily 'small' tasks, e.g.
> > > sleeping duration is small or have large weights, but this works
> > > really well for this case. This is partly because delivering data
> > > is memory bandwidth intensive hence prefer cache hot cpus. And I
> > > think this is also applicable to the general purposes: do NOT let
> > > the short running tasks suffering from cache misses caused by
> > > migration.
> > >
> > I see. My original thought was to mitigate short task migration
> > as much as possible. Either waking up the task on current CPU or previous
> > CPU should both achieve the goal in theory. Could you please describe
> > a little more about how Redis proxy server was tested? Was it tested
> > locally or using multiple machines? I asked this because for network
> > benchmarks, it might be better to wake the task close to the waker(maybe
> > the NIC interrupt) due to hot network buffer. Anyway I will test
> > your change slightly changed to see the impact, and also Redis. But it
> > would be even better if you could provide some simple test steps I can
> > try locally : )
>
> Sorry for missing the info. The test was done in production environment,
> and what I have done is only updating the kernel in several machines
> which are highly loaded, that is over 85% cpu util observed by mpstat.
> Please let me know if you want any specific info.
>
I've modified the code a little bit, which was inpired by your statement
"'small' tasks can be easily stacked on this cpu when wake up several tasks
at one time if this cpu is unloaded". I added extra check on wakee_flips, so
that tasks waking up too many different tasks will not be treated as 'small'
ones. That is to say, only tasks waking up each other exclusively will be
put together. So far this change has verified to keep the improvement for
netperf/will-it-scale, and I just launched the redis test to see what will
happen.
thanks,
Chenyu