Re: clock_gettime64 vdso bug on 32-bit arm, rpi-4
From: Arnd Bergmann
Date: Tue May 19 2020 - 16:31:45 EST
On Tue, May 19, 2020 at 10:24 PM Adhemerval Zanella
<adhemerval.zanella@xxxxxxxxxx> wrote:
> On 19/05/2020 16:54, Arnd Bergmann wrote:
> > Jack Schmidt reported a bug for the arm32 clock_gettimeofday64 vdso call last
> > month: https://github.com/richfelker/musl-cross-make/issues/96 and
> > https://github.com/raspberrypi/linux/issues/3579
> >
> > As Will Deacon pointed out, this was never reported on the mailing list,
> > so I'll try to summarize what we know, so this can hopefully be resolved soon.
> >
> > - This happened reproducibly on Linux-5.6 on a 32-bit Raspberry Pi patched
> > kernel running on a 64-bit Raspberry Pi 4b (bcm2711) when calling
> > clock_gettime64(CLOCK_REALTIME)
>
> Does it happen with other clocks as well?
Unclear.
> > - The kernel tree is at https://github.com/raspberrypi/linux/, but I could
> > see no relevant changes compared to a mainline kernel.
>
> Is this bug reproducible with mainline kernel or mainline kernel can't be
> booted on bcm2711?
Mainline linux-5.6 should boot on that machine but might not have
all the other features, so I think users tend to use the raspberry pi
kernel sources for now.
> > - From the report, I see that the returned time value is larger than the
> > expected time, by 3.4 to 14.5 million seconds in four samples, my
> > guess is that a random number gets added in at some point.
>
> What kind code are you using to reproduce it? It is threaded or issue
> clock_gettime from signal handlers?
The reproducer is very simple without threads or signals,
see the start of https://github.com/richfelker/musl-cross-make/issues/96
It does rely on calling into the musl wrapper, not the direct vdso
call.
> > - From other sources, I found that the Raspberry Pi clocksource runs
> > at 54 MHz, with a mask value of 0xffffffffffffff. From these numbers
> > I would expect that reading a completely random hardware register
> > value would result in an offset up to 1.33 billion seconds, which is
> > around factor 100 more than the error we see, though similar.
> >
> > - The test case calls the musl clock_gettime() function, which falls back to
> > the clock_gettime64() syscall on kernels prior to 5.5, or to the 32-bit
> > clock_gettime() prior to Linux-5.1. As reported in the bug, Linux-4.19 does
> > not show the bug.
> >
> > - The behavior was not reproduced on the same user space in qemu,
> > though I cannot tell whether the exact same kernel binary was used.
> >
> > - glibc-2.31 calls the same clock_gettime64() vdso function on arm to
> > implement clock_gettime(), but earlier versions did not. I have not
> > seen any reports of this bug, which could be explained by users
> > generally being on older versions.
> >
> > - As far as I can tell, there are no reports of this bug from other users,
> > and so far nobody could reproduce it.
> >
> > - The current musl git tree has been patched to not call clock_gettime64
> > on ARM because of this problem, so it cannot be used for reproducing it.
>
> So should glibc follow musl and remove arm clock_gettime6y4 vDSO support
> or this bug is localized to an specific kernel version running on an
> specific hardware?
I hope we can figure out what is actually going on soon, there is probably
no need to change glibc before we have.
Arnd