RE: Linux 6.10-rc2 - massive performance regression

From: David Laight
Date: Sun Jun 09 2024 - 10:25:45 EST


From: Linus Torvalds
> Sent: 08 June 2024 23:01
>
> On Sat, 8 Jun 2024 at 14:36, David Laight <David.Laight@xxxxxxxxxx> wrote:
> >
> > I'll try to remember how to bisect through the merge :-)
>
> git bisect should just do all the work for you. All you need to do is
> give a know good and bad point, and keep testing what git bisect asks
> you to do.

That would all be easier if the kernel version didn't keep changing
and/or after 'make install' grub defaulted to booting the last
built kernel.
I may already have 'fixed' this system so it doesn't default
to booting the 'last booted' kernel - a real PITA when you are
trying to fix non-booting kernels.

Anyway I completely failed to manage to build a 'good' kernel.
Even 6.9-rc5 failed and bisecting between 6.9-rc4 and 6.9.rc5
ended up building a 6.9-rc3+ kernel and 'git diff v6.9-rc4'
was giving massive changed even though 'git bisect view' only
gave a few changes that couldn't be relevant.

I finally realised what the different between 'good' and 'bad'
kernels was.
All down to CONFIG_SPECULATION_MITIGATIONS being renamed
CONFIG_CPU_MITIGATIONS and getting enabled 'by mistake'.

If I build a 6.10-rc2 kernel without the mitigations I get
the 'fast' behaviour.
So there must actually be something quite subtle in the
timings.

So there is still a problem that if a cpu-intensive process
get moved to a different cpu on a 'mostly idle' system then the
new cpu is likely to be running at a low frequency and will
take a while to speed up.
Move it often enough and it will run very slowly.
I suspect that something like (untested):

cpu=0; while [ $cpu -lt $num_cpu ]; do
taskset --cpu-list $cpu sh -c 'while sleep 0.01; do :; done' &
cpu=$((cpu + 1))
done

will cause a cpu-bound process to run very slowly.

I think that ought to be considered a bug.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)