Re: [PATCH 0/4] Reinstate and improve MIPS `do_div' implementation
From: Maciej W. Rozycki
Date: Thu Apr 22 2021 - 12:55:10 EST
On Thu, 22 Apr 2021, H. Nikolaus Schaller wrote:
> > This has passed correctness verification with test_div64 and reduced the
> > module's average execution time down to 1.0445s and 0.2619s from 1.0668s
> > and 0.2629s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.
>
> test only [PATCH 1/4 and 2/4]:
>
> [ 256.301140] test_div64: Completed 64bit/32bit division and modulo test, 0.291154944s elapsed
>
> + [PATCH 3/4]
>
> [ 1698.698920] test_div64: Completed 64bit/32bit division and modulo test, 0.132142865s elapsed
>
> + [PATCH 4/4]
>
> [ 466.818349] test_div64: Completed 64bit/32bit division and modulo test, 0.134429075s elapsed
>
> So the new code is indeed faster than the default implementation.
> [PATCH 4/4] has no significant influence (wouldn't say it is slower because timer resolution
> isn't very high on this machine and the kernel has some scheduling issue [1]).
Have you used it as a module or at bootstrap? I have noticed that at
bootstrap the initialisation of the random number generator sometimes
interferes with the benchmark, which happens when there's an intervening
message produced, e.g.:
test_div64: Starting 64bit/32bit division and modulo test
random: fast init done
test_div64: Completed 64bit/32bit division and modulo test, 1.069906272s elapsed
I think it can be worked around by configuration changes so that more
stuff is run between the RNG and the test module, but instead I have
simply inserted:
mdelay(5000);
at the beginning of `test_div64_init' instead, as for historical reasons I
haven't got the systems involved set up for modules (beyond Linux 2.4) at
this time.
NB I have run the benchmark five times with each change and system and
with the RNG taken out of the picture results were very stable as any
fluctuation only started at the fifth decimal digit. Both the DECstation
(the model I used anyway) and the Malta have a high-resolution clock
source though, the I/O ASIC free-running counter register at 25MHz (used
by David L. Mills, the original author of the NTP suite, for his reference
implementation) and the CP0 Count register at 80MHz respectively.
I would expect your JZ4730 device to have the CP0 Count register as well,
as it has been architectural ever since MIPS III really, or is your system
SMP with CP0 Count registers out of sync across CPUs due to sleep modes or
whatever?
Thanks for sharing your figures.
> [1] we are preparing full support for the JZ4730 based Skytone Alpha machine. Most features
> are working except sound/I2S. I2C is a little unreliable and Ethernet has hickups. And scheduling
> which indicates some fundamental IRQ or timer issue we could not yet identify.
Good luck with that!
Maciej