Re: [PATCH -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime

From: Peter Zijlstra
Date: Thu Jul 25 2024 - 11:15:19 EST


On Thu, Jul 25, 2024 at 10:49:46PM +0800, zhengzucheng wrote:
> Sorry, I made a mistake here. CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set.
>
> 在 2024/7/25 22:05, Peter Zijlstra 写道:
> > On Thu, Jul 25, 2024 at 12:03:15PM +0000, Zheng Zucheng wrote:
> > > In extreme test scenarios:
> > > the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime,
> > > utime = 18446744073709518790 ns, rtime = 135989749728000 ns
> > >
> > > In cputime_adjust() process, stime is greater than rtime due to
> > > mul_u64_u64_div_u64() precision problem.
> > > before call mul_u64_u64_div_u64(),
> > > stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
> > > after call mul_u64_u64_div_u64(),
> > > stime = 135989949653530
> > >
> > > unsigned reversion occurs because rtime is less than stime.
> > > utime = rtime - stime = 135989749728000 - 135989949653530
> > > = -199925530
> > > = (u64)18446744073709518790
> > >
> > > Trigger scenario:
> > > 1. User task run in kernel mode most of time.
> > > 2. The ARM64 architecture && CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y &&
> > > TICK_CPU_ACCOUNTING=y
> > >
> > > Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime
> > >
> > > Fixes: 3dc167ba5729 ("sched/cputime: Improve cputime_adjust()")
> > > Signed-off-by: Zheng Zucheng <zhengzucheng@xxxxxxxxxx>
> > > ---
> > > kernel/sched/cputime.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> > > index aa48b2ec879d..365c74e95537 100644
> > > --- a/kernel/sched/cputime.c
> > > +++ b/kernel/sched/cputime.c
> > > @@ -582,6 +582,8 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
> > > }
> > > stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> > > + if (unlikely(stime > rtime))
> > > + stime = rtime;

Ooh,.. I see, this is because the generic fallback for
mul_u64_u64_div_u64() is yuck :/

On x86_64 this is just two instructions and it does a native:

u64*u64->u128
u128/u64->u64

And this should never happen. But in the generic case, we appoximate and
urgh.

So yeah, but then perhaps add a comment like:

/*
* Because mul_u64_u64_div_u64() can approximate on some
* achitectures; enforce the constraint that: a*b/(b+c) <= a.
*/
if (unlikely(stime > rtime))
stime = rtime;

Also, I would look into doing a native arm64 version, I'd be surprised
if it could not do better than the generic variant.