Re: [PATCH RESEND] sched/nohz: Add HRTICK_BW for using cfs bandwidth with nohz_full

From: Phil Auld
Date: Thu May 18 2023 - 10:38:33 EST


On Thu, May 18, 2023 at 03:47:46PM +0200 Peter Zijlstra wrote:
> On Thu, May 18, 2023 at 09:20:38AM -0400, Phil Auld wrote:
> > CFS bandwidth limits and NOHZ full don't play well together. Tasks
> > can easily run well past their quotas before a remote tick does
> > accounting. This leads to long, multi-period stalls before such
> > tasks can run again. Use the hrtick mechanism to set a sched
> > tick to fire at remaining_runtime in the future if we are on
> > a nohz full cpu, if the task has quota and if we are likely to
> > disable the tick (nr_running == 1). This allows for bandwidth
> > accounting before tasks go too far over quota.
> >
> > A number of container workloads use a dynamic number of real
> > nohz tasks but also have other work that is limited which ends
> > up running on the "spare" nohz cpus. This is an artifact of
> > having to specify nohz_full cpus at boot. Adding this hrtick
> > resolves the issue of long stalls on these tasks.
> >
> > Add the sched_feat HRTICK_BW off by default to allow users to
> > enable this only when needed.
>
> OMG; so because NOHZ_FULL configuration sucks, we add hacks on?
>

I suppose one could make that argument. The HRTICK mechanism is already
in place and used similarly for DL (and that also benefits nohz workloads).

I don't see NOHZ_FULL configuration getting better anytime soon, although
I think efforts are being made in that direction.

This seemed to be a sane way to handle what are effectively conflicting
requirements. Stalling a task to the point the host gets rebooted is
pretty painful. Maybe if we could fail the tick_stop test in this
case that would work but that would keep all the ticks whereas this
tries to respect the request for nohz as much as possible.

Thanks for taking a look :)

Cheers,
Phil
--