RE: [PATCH v5 3/6] timekeeping: Add clocksource to system_time_snapshot
From: Jianyong Wu (Arm Technology China)
Date: Wed Oct 16 2019 - 06:02:02 EST
Hi tglx,
> -----Original Message-----
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Sent: Wednesday, October 16, 2019 4:13 AM
> To: Jianyong Wu (Arm Technology China) <Jianyong.Wu@xxxxxxx>
> Cc: netdev@xxxxxxxxxxxxxxx; yangbo.lu@xxxxxxx; john.stultz@xxxxxxxxxx;
> pbonzini@xxxxxxxxxx; sean.j.christopherson@xxxxxxxxx; maz@xxxxxxxxxx;
> richardcochran@xxxxxxxxx; Mark Rutland <Mark.Rutland@xxxxxxx>;
> will@xxxxxxxxxx; Suzuki Poulose <Suzuki.Poulose@xxxxxxx>; linux-
> kernel@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
> kvmarm@xxxxxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; Steve Capper
> <Steve.Capper@xxxxxxx>; Kaly Xin (Arm Technology China)
> <Kaly.Xin@xxxxxxx>; Justin He (Arm Technology China)
> <Justin.He@xxxxxxx>; nd <nd@xxxxxxx>
> Subject: Re: [PATCH v5 3/6] timekeeping: Add clocksource to
> system_time_snapshot
>
> On Tue, 15 Oct 2019, Jianyong Wu wrote:
>
> > Sometimes, we need check current clocksource outside of timekeeping
> > area. Add clocksource to system_time_snapshot then we can get
> > clocksource as well as system time.
>
> This changelog is telling absolutely nothing WHY anything outside of the
> timekeeping core code needs access to the current clocksource. Neither
> does it tell why it is safe to provide the pointer to random callers.
>
Really need more information.
> > +/*
> > + * struct system_time_snapshot - simultaneous raw/real time capture
> with
> > + * counter value
> > + * @sc: Contains clocksource and clocksource counter value
> to produce
> > + * the system times
> > + * @real: Realtime system time
> > + * @raw: Monotonic raw system time
> > + * @clock_was_set_seq: The sequence number of clock was set
> events
> > + * @cs_was_changed_seq: The sequence number of clocksource change
> events
> > + */
> > +struct system_time_snapshot {
> > + struct system_counterval_t sc;
> > + ktime_t real;
> > + ktime_t raw;
> > + unsigned int clock_was_set_seq;
> > + u8 cs_was_changed_seq;
> > +};
> > +
> > /*
> > * Get cross timestamp between system clock and device clock
> > */
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 44b726bab4bd..66ff089605b3 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -983,7 +983,8 @@ void ktime_get_snapshot(struct
> system_time_snapshot *systime_snapshot)
> > nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
> > } while (read_seqcount_retry(&tk_core.seq, seq));
> >
> > - systime_snapshot->cycles = now;
> > + systime_snapshot->sc.cycles = now;
> > + systime_snapshot->sc.cs = tk->tkr_mono.clock;
>
> The clock pointer can change right after the store, the underlying data can be
> freed .....
>
Yeah, need put it into seqcount region.
> Looking at the rest of the patch set the actual usage site is:
>
> > + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> > + ktime_get_snapshot(&systime_snapshot);
> > + if (!is_arm_arch_counter(systime_snapshot.sc.cs))
> > + return kvm_psci_call(vcpu);
>
> and that function does:
>
> > +bool is_arm_arch_counter(void *cs)
>
> void *? Type safety is overrated, right? The type is well known....
>
> +{
> + return (struct clocksource *)cs == &clocksource_counter;
>
> That nonsensical typecast does not make up for that.
>
It's really bad code and need fix.
> +}
>
> So while the access to the pointer is actually safe, this is not going to happen
> simply because you modify a generic interface in a way which will lead the
> next developer to insane assumptions about the validity of that pointer.
>
> While the kernel is pretty lax in terms of isolation due to the nature of the
> programming language, this does not justify to expose critical internals of
> core code to random callers. Guess why most of the timekeeping internals
> are carefully shielded from external access.
>
> Something like the completely untested (not even compiled) patch below
> gives you access to the information you need and allows to reuse the
> mechanism for other purposes without adding is_$FOO_timer() all over the
> place.
>
> Thanks,
>
> tglx
>
> 8<--------------
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -9,6 +9,7 @@
> #ifndef _LINUX_CLOCKSOURCE_H
> #define _LINUX_CLOCKSOURCE_H
>
> +#include <linux/clocksource_ids.h>
> #include <linux/types.h>
> #include <linux/timex.h>
> #include <linux/time.h>
> @@ -49,6 +50,10 @@ struct module;
> * 400-499: Perfect
> * The ideal clocksource. A must-use where
> * available.
> + * @id: Defaults to CSID_GENERIC. The id value is
> captured
> + * in certain snapshot functions to allow callers to
> + * validate the clocksource from which the snapshot
> was
> + * taken.
> * @read: returns a cycle value, passes clocksource as argument
> * @enable: optional function to enable the clocksource
> * @disable: optional function to disable the clocksource
> @@ -91,6 +96,7 @@ struct clocksource {
> const char *name;
> struct list_head list;
> int rating;
> + enum clocksource_ids id;
> int (*enable)(struct clocksource *cs);
> void (*disable)(struct clocksource *cs);
> unsigned long flags;
> --- /dev/null
> +++ b/include/linux/clocksource_ids.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_CLOCKSOURCE_IDS_H
> +#define _LINUX_CLOCKSOURCE_IDS_H
> +
> +/* Enum to give clocksources a unique identifier */ enum
> +clocksource_ids {
> + CSID_GENERIC = 0,
> + CSID_ARM_ARCH_COUNTER,
> + CSID_MAX,
> +};
> +
Does this mean I must add clocksource id for all kinds of ARCHs and update all the code which have checked clocksource in the old way?
Thanks
Jianyong
> +#endif
> --- a/include/linux/timekeeping.h
> +++ b/include/linux/timekeeping.h
> @@ -2,6 +2,7 @@
> #ifndef _LINUX_TIMEKEEPING_H
> #define _LINUX_TIMEKEEPING_H
>
> +#include <linux/clocksource_ids.h>
> #include <linux/errno.h>
>
> /* Included from linux/ktime.h */
> @@ -228,15 +229,17 @@ extern void timekeeping_inject_sleeptime
> * @cycles: Clocksource counter value to produce the system times
> * @real: Realtime system time
> * @raw: Monotonic raw system time
> + * @cs_id: The id of the current clocksource which produced the
> snapshot
> * @clock_was_set_seq: The sequence number of clock was set
> events
> * @cs_was_changed_seq: The sequence number of clocksource change
> events
> */
> struct system_time_snapshot {
> - u64 cycles;
> - ktime_t real;
> - ktime_t raw;
> - unsigned int clock_was_set_seq;
> - u8 cs_was_changed_seq;
> + u64 cycles;
> + ktime_t real;
> + ktime_t raw;
> + enum clocksource_ids cs_id;
> + unsigned int clock_was_set_seq;
> + u8 cs_was_changed_seq;
> };
>
> /*
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -921,6 +921,9 @@ int __clocksource_register_scale(struct
>
> clocksource_arch_init(cs);
>
> + if (WARN_ON_ONCE((unsigned int)cs->id >= CSID_MAX))
> + cs->id = CSID_GENERIC;
> +
> /* Initialize mult/shift and max_idle_ns */
> __clocksource_update_freq_scale(cs, scale, freq);
>
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -979,6 +979,7 @@ void ktime_get_snapshot(struct system_ti
> do {
> seq = read_seqcount_begin(&tk_core.seq);
> now = tk_clock_read(&tk->tkr_mono);
> + systime_snapshot->cs_id = tk->tkr_mono.clock->id;
> systime_snapshot->cs_was_changed_seq = tk-
> >cs_was_changed_seq;
> systime_snapshot->clock_was_set_seq = tk-
> >clock_was_set_seq;
> base_real = ktime_add(tk->tkr_mono.base,
>
>