Re: [RFC][PATCH 0/3] DynamicHZ: Configuring the timer tick rate at boot time
From: Peter Zijlstra
Date: Mon Feb 03 2025 - 06:14:54 EST
On Tue, Jan 28, 2025 at 05:46:10PM +0100, Thomas Gleixner wrote:
> 4) Scheduler
>
> I leave that part to Peter as he definitely has a better overview
> of what needs to be done than me.
Ponies, scheduler wants ponies :-)
So scheduler tick does waaay too much:
- time keeping / accounting:
. internally
. psi
. cgroup.cpuacct
. posix timers
. a million other things
- periodic update/aging of things like:
. global load avg
. hw pressure
. freq scale
- tied into perf
(which I've briefly touched upon earlier)
- drives load balance
- drives mm scanning for NUMA crud
- drives tick based preemption
The whole load-balance and global-load-avg are basically interal tick
based timers. Not sure replacing them with timer wheel timers makes
sense due to the buckets, but it might also not be the worst.
The whole preemption thing could probably be replaced with HRTICK (which
might be suffering from bitrot), but the problem has always been with
hrtimers being too expensive (on x86). But ideally we'd move away from
tick based preemption.
That said, driving preemption with dynamic HZ should work just fine.
Most of the time accounting is TSC (or sched_clock()) based, and derives
the measure of time from that. But things like perf use TICK_NSEC to
tell us how much time is between ticks -- so if you go and make that
dynamic you really do have to fix that.
Anyway, I would really like to understand what exactly is driving the
cost in your case. It should be possible to move things out of the tick,
or run them at a lower rate without running all of it lower.