RE: [PATCH v4 1/1] printk: fix zero-valued printk timestamps in early boot

From: Bird, Tim

Date: Mon Apr 20 2026 - 18:21:07 EST


> -----Original Message-----
> From: Petr Mladek <pmladek@xxxxxxxx>
>
> Adding Brian into Cc. On Wed 2026-04-15 00: 38: 29, Thomas Gleixner wrote: > On Fri, Apr 10 2026 at 14: 37, Tim Bird wrote: > > + > >
> +#include <linux/timekeeping. h> > > +#ifdef CONFIG_ARM64 > > +#include <asm/sysreg. h>
> Adding Brian into Cc.
>
> On Wed 2026-04-15 00:38:29, Thomas Gleixner wrote:
> > On Fri, Apr 10 2026 at 14:37, Tim Bird wrote:
> > > +
> > > +#include <linux/timekeeping.h>
> > > +#ifdef CONFIG_ARM64
> > > +#include <asm/sysreg.h>
> > > +#endif
> > > +
> > > +#ifdef CONFIG_EARLY_CYCLES_KHZ
> > > +static inline u64 early_unsafe_cycles(void)
> > > +{
> > > +#if defined(CONFIG_X86_64)
> > > + /*
> > > + * This rdtsc may happen before secure TSC is initialized, and
> > > + * it is unordered. So please don't use this value for cryptography
> > > + * or after SMP is initialized.
> > > + */
> > > + return rdtsc();
> > > +#elif defined(CONFIG_ARM64)
> > > + return read_sysreg(cntvct_el0);
> > > +#elif defined(CONFIG_RISCV_TIMER)
> > > + u64 val;
> > > +
> > > + asm volatile("rdtime %0" : "=r"(val));
> > > + return val;
> > > +#else
> > > + return 0;
> > > +#endif
> > > +}
> >
> > No. Generic code and generic headers have no business to implement any
> > architecture specific code and there is zero justification for
> > architecture specific #ifdefs in generic code.
Got it. Thanks. If the patch doesn't get accepted upstream, I may keep all the
cycle-counting instructions in one place, to make is easier to apply the patch,
but I understand not wanting to having architecture #ifdefs sprinkled around
generic code in the upstream kernel.

>
> Yeah, this looks a bit wild.
>
> > > +/* returns a nanosecond value based on early cycles */
> > > +static inline u64 early_times_ns(void)
> > > +{
> > > + if (CONFIG_EARLY_CYCLES_KHZ)
> > > + /*
> > > + * Note: the multiply must precede the division to avoid
> > > + * truncation and loss of resolution
> > > + * Don't use fancier MULT/SHIFT math here. Since this is
> > > + * static, the compiler can optimize the math operations.
> > > + */
> > > + return (early_unsafe_cycles() * NS_PER_KHZ) / CONFIG_EARLY_CYCLES_KHZ;
> >
> > This code will result in a division by zero warning from any reasonable
> > compiler because this is evaluated _before_ it is eliminated.

Yep. I have several solutions to quiet the warning and I'm considering
which one is best. I apologize for missing this warning in my own testing.
It's a long story, but most of my work has been done in the yocto project
and the kernel build warnings were getting masked.

> >
> > > @@ -2294,6 +2295,8 @@ int vprintk_store(int facility, int level,
> > > * timestamp with respect to the caller.
> > > */
> > > ts_nsec = local_clock();
> > > + if (unlikely(!ts_nsec))
> > > + ts_nsec = early_times_ns();
> >
> > I explained to you how this wants to be implemented to be useful and you
> > declared that you are unwilling to put the effort in.

You are correct that I said I didn't want to advance the PoC solution.
I'll explain more below why I prefer the solution in my V4 patch.

> >
> > My NAK for this disgusting and tasteless hack still applies.
> >
> > Either you are willing to work with the community and the relevant
> > maintainers or you can keep your hackery maintained out of tree for
> > those who care about it and are willing to ignore the fallout.

I believe I've shown a willingness to work with relevant maintainers.
I have technical reasons (which I explain below) for preferring my approach,
but I recognize that other perspectives and considerations may override
my tradeoff preferences.

>
> The discussion went wild and is full of emotions.
>
> Let me summarize my understanding:
>
> There are people who try to optimize boot times. Tim is one
> of them. He used an out-of-tree patch for many years. He decided
> to share it to make the life easier for others.
>
> Tim's original approach was trivial [Tim1]. IMHO, he used a cycle counter
> with a stable frequency and hardcoded the computation to timestamps.
> It opened discussion how to integrate it better:
>
> 1. Avoid hard coded value in Kconfig by some calibration [Tim2][Tim3]
> One hardcoded value is back in [Tim4] for simplicity.
>
> 2. Avoid jump in the timestamps when timekeeping is allowed.
> It was partly removed in [v2][v3] by already "calibrated"
> timestamps read by userspace (syslog, /dev/kmsg). Again,
> this approach was removed in v4 for simplicity.
>
> Pros of v4:
>
> + very simple
> + gives some reasonably looking timestamps
> + might be good enough for the purpose
>
> Cons of v4:
>
> + hacky, does not compile in some case, ...
I believe I can fix this one.

> + hardcoded value in config
> + jump in timestamps when timekeeping is initialized
>
>
> Now, we have alternative approach by Thomas [Thomas1] which allows
> to initialize time keeping on x86 ASAP:
>
> Pros:
>
> + clean and well integrated with timekeeping
> + no hard coded values in Kconfig
> + no jump in timestamps
>
> Cons:
>
> + need non-trivial changes for each supported architecture
> + no timestamps for the very early code (30ms on the measured x86_64)
Just a comment. I think the blind spot duration for Thomas' code is much
smaller on x86_64 than we've been discussing. I see 30ms and 18ms reports, but in
my experience this is the blind spot duration *before* applying Thomas' patch.
With his patch, I believe it gets down into the range of a few milliseconds (on x86_64),
which is obviously sufficient for researching most of the areas that are likely problematical.
I'm not sure how it fares on other arches, as those need different approaches.

I would add one more con here:
- requires new kernel command line parameters

>
> My view:
>
> Thomas' approach is great because it is clean integration, ...
> but:
>
> 1. I am not sure if the complexity is worth it. There are only few
> people (Tim's tip is 50) who are interested into the early
> boot times and all are developers.
>
> 2. It does not cover the very early boot. And Brian mentioned
> a real life problem found in this area, see
> https://lore.kernel.org/all/acxx9Bt0N3nhtLgN@xxxxxxxxxx/
>
> Steven mentioned getting the very early timestamps from
> the firmware, see
> https://lore.kernel.org/all/20260401111244.5057a89c@xxxxxxxxxxxxxxxxxx/
> But I am not sure how complicated it is. And if it
> does not need any special HW or so.

I'll just add a few pros and cons, if that's OK.
Tim's approach:
Pros:
- setting the calibration value can be automated, due to it being a CONFIG option.
Almost all test frameworks I'm aware of support
changing the kernel config, rebuilding the kernel, installing the kernel, and rebooting
the device under test. Due to differences in bootloaders, I'm not aware of any general
solution for re-writing kernel command line parameters via automation. That is, no
test framework I'm aware of supports modifying the kernel command line parameters,
across different platforms and bootloaders.
- complete coverage. That is, you can get non-zero timestamps from the very
first line in start_kernel, leaving the blind spot (in C code) virtually non-existent.
This is also useful for automation, since (timing-useful) printks can be added and removed anywhere
in the code, without exceptions or qualifications.
- (likely?) puts the early timestamps into the same timestamp space as the
bootloader. That is, the V4 patch is very likely to use timing hardware
that is the same as the one used by the bootloader (including the same timing base!).
At least, this is true for U-Boot and a few other embedded bootloaders I'm familiar with.
This is useful to see the timing relationship between the kernel and the bootloader,
and even to get estimates of the time taken for another blind spot - during the
handoff from bootloader to kernel. In this case the discontinuity between the
early kernel timestamps and the later kernel timestamps is actually a feature,
as you can use it to calculate timings for the whole boot process.
- when not in use, the new overhead to the kernel is exactly 0. This was a key
goal when making this patch. I wanted something that evaporated completely
(due to compiler optimizations) so that when turned off, this instrumentation
did not, itself, add any additional time to kernel boot. My V2 and V3
patches did not have that feature, and I was uncomfortable with that.
- when in use AND not in use, does not change the value of any existing
printk timestamps, relative to their historic values. That is, only the
zero timestamps are changed from zero to something, and all other timestamps
should be the same as before the patch was applied. This is useful at least
to me, because I have a large amount of data for a lot of different platforms,
where the timestamps are all relative to kernel timekeeping start. This means
that systems which monitor boot time data over time (which I have) don't
have to recalibrate the data to detect differences over different kernel versions.

Cons:
- due to hardcoded value, cannot be used in a distribution kernel
- could encourage the use of non-standard and possibly deceptive or
dangerous clocks.
- If the developer sets the wrong config value, they can get misleading timestamps.
However, they should be able to detect this. And often order-of-magnitude
values are sufficient to identify areas to work on for optimization.
- cannot be used on platforms that do not have a stable clock or cycle counter
that does not require initialization by the kernel. I think almost all modern,
64-bit processors have such hardware, and that most virtual machines and
emulators support such hardware, either through pass-through or emulation.
Although some don't provide hardware calibration info, I am not aware of
any arm, riscv or x86 64-bit platforms that don't support the instructions for
accessing the cycle or time, that the V4 patch uses.
There might also be 32-bit or other low end embedded platforms that have
hardware sufficient for this (e.g. either initialized in hardware or initialized
in the firmware for a platform before kernel start)

>
> Tim's approach is interesting because of the simplicity
> and might be good enough for the few (50) users.
>
> I think that there are basically two problems with Tim's approach:
>
> 1. It needs a reasonable API to get a cycle counter with
> a stable frequency.
>
> My understanding is that get_cycles() might be good enough.
> The only problem is that it the stability is not guaranteed
> and it is not calibrated.
>
> Would it help to rename it to get_bogo_cycles() ?

For the record, I would be fine with this. I would even support having
get_bogo_cycles() (or early_unsafe_cycles()) do something drastic like
taint the kernel, or emit noticeable garbage (e.g. -1) after kernel
timekeeping starts, to help prevent future abuse of the function.

>
>
> 2. The early timestamps provided by the bogo cycles
> are not synchronized with timestamps from
> the proper time keeping.
>
> Would it help to print a disclaimer, similar to,
> for example, trace_printk() first use?
> Something like:
>
> [ 0.002912] **********************************************************
> [ 0.002917] **** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE
> [ 0.002921] **
> [ 0.002935] ** Using BOGO early timestamps
> [ 0.002939] **
> [ 0.002943] ** They are not properly calibrated and might use a source
> [ 0.002949] ** with an unstable frequency.
> [ 0.002953] **
> [ 0.002957] ** They are not comparable with timestamps after
> [ 0.002961] ** the timekeeping is initialized.
> [ 0.002966] **
> [ 0.002968] **** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE
> [ 0.002971] *******************************************************

That's a bit long, but I think a warning like this is useful.

> [ 0.002975] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
> [ 0.002998] Linux version 7.0.0-rc6-v8+ (tbird@timdesk) (aarch64-linux-gnu-gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU
> Binutils for Ubuntu) 2.42) #20 SMP PREEMPT Fri Apr 10 11:57:48 MDT 2026
> [ 0.003002] KASLR enabled
> [ 0.003338] random: crng init done
> [ 0.003866] Machine model: Raspberry Pi 4 Model B Rev 1.5
> [ 0.004495] efi: UEFI not found.
> ...
> [ 0.183552] Root IRQ handler: gic_handle_irq
> [ 0.183561] GIC: Using split EOI/Deactivate mode
> [ 0.183699] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> [ 0.183958] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns
> [ 0.183952] arch_timer: cp15 timer running at 54.00MHz (phys).
> [ 0.183957] **********************************************************
> [ 0.183962] **** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE
> [ 0.183967] **
> [ 0.183971] ** End of BOGO early timestamps
> [ 0.183976] **
> [ 0.183982] **** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE
> [ 0.183989] **********************************************************

I actually worked on a notice like this, to bridge/explain the transition (for humans). But it turned
out that different platforms initialized their clocks differently enough that it would have
required putting the notice in different places, which would have required architecture-specific
#ifdefs in generic code (so I didn't do it).

> [ 0.000000] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 4398046511102ns
> [ 0.000157] Console: colour dummy device 80x25
> [ 0.000165] printk: legacy console [tty1] enabled
>
> My view is that it would be nice to make the life easier
> for the 50 developers who do very useful work.
>
> But we do not need to create and maintain any complicated
> code for this. If the bogo cycles are good enough.
> If they already have some users and have to stay anyway.
> If we make it clear that the early timestamps are bogus...
>
> IMHO, the main risk is that it won't be used just by the 50 developers
> but it will get misused and open some can of worms. I think that
> the risk might be acceptable but...

That is indeed a risk that should be dealt with. I *do* want to avoid
introducing more technical debt into the kernel for such a minor feature.
Thomas has much more experience than me dealing with technical debt.
So his objections should be taken seriously and addressed.

> What do you think, please?
> Am I too naive in this case?
>
> [Tim1] https://lore.kernel.org/all/39b09edb-8998-4ebd-a564-7d594434a981@xxxxxxxx/
> [Tim2] https://lore.kernel.org/all/20260124194027.713991-1-tim.bird@xxxxxxxx/
> [Tim3] https://lore.kernel.org/all/20260210234741.3262320-1-tim.bird@xxxxxxxx/
> [Tim4] https://lore.kernel.org/all/20260410203741.997410-1-tim.bird@xxxxxxxx/
>
> [Thomas1] https://lore.kernel.org/all/87fr5ib6ks.ffs@tglx/

I understand Thomas' objections to my approach, and that this may not be suitable for
upstream. However, if the dangers can be mitigated, I would be happy to see this
upstream as another tool to help developers working on boot time issues.

Regards,
-- Tim