Re: [PATCH V2] rtc: mc146818: Detect and handle broken RTCs

From: Dirk Gouders
Date: Sun Jan 31 2021 - 08:40:04 EST


Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes:

> The recent fix for handling the UIP bit unearthed another issue in the RTC
> code. If the RTC is advertised but the readout is straight 0xFF because
> it's not available, the old code just proceeded with crappy values, but the
> new code hangs because it waits for the UIP bit to become low.
>
> Add a sanity check in the RTC CMOS probe function which reads the RTC_VALID
> register (Register D) which should have bit 0-6 cleared. If that's not the
> case then fail to register the CMOS.
>
> Add the same check to mc146818_get_time(), warn once when the condition
> is true and invalidate the rtc_time data.

In case it is helpful: on my hardware this patch triggers a warning
(attached below).

Without it the rtc messages look like this:

[ 2.783386] rtc_cmos 00:01: RTC can wake from S4
[ 2.784302] rtc_cmos 00:01: registered as rtc0
[ 2.785036] rtc_cmos 00:01: setting system clock to 2021-01-31T10:13:40 UTC (1612088020)
[ 2.785713] rtc_cmos 00:01: alarms up to one month, y3k, 114 bytes nvram, hpet irqs

Dirk

[ 7.258410] ------------[ cut here ]------------
[ 7.258414] WARNING: CPU: 2 PID: 0 at drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x2b/0x1e5
[ 7.258420] Modules linked in: iwlmvm(+) mac80211 iwlwifi sdhci_pci amdgpu(+) drm_ttm_helper cfg80211 ttm cqhci gpu_sched sdhci ccp thinkpad_acpi(+) rng_core nvram tpm_tis(+) tpm_tis_core wmi tpm pinctrl_amd
[ 7.258432] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 5.11.0-rc5-next-20210129-x86_64 #180
[ 7.258434] Hardware name: LENOVO 20U50008GE/20U50008GE, BIOS R19ET26W (1.10 ) 06/22/2020
[ 7.258435] RIP: 0010:mc146818_get_time+0x2b/0x1e5
[ 7.258437] Code: 56 41 55 45 31 ed 41 54 55 53 48 89 fb 48 c7 c7 bc d9 eb 82 e8 26 d8 36 00 bf 0d 00 00 00 48 89 c5 e8 6d d1 8f ff a8 7f 74 24 <0f> 0b 48 c7 c7 bc d9 eb 82 48 89 ee e8 bc d6 36 00 b0 ff b9 24 00
[ 7.258438] RSP: 0018:ffffc9000022cef0 EFLAGS: 00010002
[ 7.258440] RAX: 0000000000000031 RBX: ffffc9000022cf24 RCX: 0000000000000000
[ 7.258441] RDX: 0000000000000001 RSI: ffff888105607000 RDI: 000000000000000d
[ 7.258441] RBP: 0000000000000046 R08: ffffc9000022cf24 R09: 0000000000000000
[ 7.258442] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888105607000
[ 7.258443] R13: 0000000000000000 R14: ffffc9000022cfa4 R15: 0000000000000000
[ 7.258444] FS: 0000000000000000(0000) GS:ffff88840ec80000(0000) knlGS:0000000000000000
[ 7.258445] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.258446] CR2: 00007f2ed26c4160 CR3: 000000000480a000 CR4: 0000000000350ee0
[ 7.258447] Call Trace:
[ 7.258449] <IRQ>
[ 7.258450] hpet_rtc_interrupt+0xd3/0x1a3
[ 7.258454] __handle_irq_event_percpu+0x6b/0x12e
[ 7.258457] handle_irq_event_percpu+0x2c/0x6f
[ 7.258459] handle_irq_event+0x23/0x43
[ 7.258461] handle_edge_irq+0x9e/0xbb
[ 7.258463] asm_call_irq_on_stack+0x12/0x20
[ 7.258467] </IRQ>
[ 7.258467] common_interrupt+0x9a/0x123
[ 7.258470] asm_common_interrupt+0x1e/0x40
[ 7.258472] RIP: 0010:cpuidle_enter_state+0x13e/0x1fe
[ 7.258475] Code: 49 89 c4 e8 bd fd ff ff 31 ff e8 3e 80 92 ff 45 84 ff 74 12 9c 58 0f ba e0 09 73 03 0f 0b fa 31 ff e8 13 16 96 ff fb 45 85 f6 <0f> 88 97 00 00 00 49 63 d6 4c 2b 24 24 48 6b ca 68 48 6b c2 30 4c
[ 7.258476] RSP: 0018:ffffc90000167eb0 EFLAGS: 00000206
[ 7.258477] RAX: ffff88840eca8240 RBX: ffff888101e0d400 RCX: 00000001b0a24b16
[ 7.258478] RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000000
[ 7.258478] RBP: 0000000000000003 R08: 00000000ffffffff R09: 0000000000000000
[ 7.258479] R10: ffff88810083c4a8 R11: 0000000000000000 R12: 00000001b0a24b48
[ 7.258480] R13: ffffffff8299cc60 R14: 0000000000000003 R15: 0000000000000000
[ 7.258482] cpuidle_enter+0x2b/0x37
[ 7.258483] do_idle+0x126/0x184
[ 7.258485] cpu_startup_entry+0x18/0x1a
[ 7.258486] secondary_startup_64_no_verify+0xb0/0xbb
[ 7.258489] ---[ end trace 9da59c3696ed99d8 ]---


> Reported-by: Mickaël Salaün <mic@xxxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Tested-by: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx>
> ---
> V2: Fixed the sizeof() as spotted by Mickaël
> ---
> drivers/rtc/rtc-cmos.c | 8 ++++++++
> drivers/rtc/rtc-mc146818-lib.c | 7 +++++++
> 2 files changed, 15 insertions(+)
>
> --- a/drivers/rtc/rtc-cmos.c
> +++ b/drivers/rtc/rtc-cmos.c
> @@ -805,6 +805,14 @@ cmos_do_probe(struct device *dev, struct
>
> spin_lock_irq(&rtc_lock);
>
> + /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
> + if ((CMOS_READ(RTC_VALID) & 0x7f) != 0) {
> + spin_unlock_irq(&rtc_lock);
> + dev_warn(dev, "not accessible\n");
> + retval = -ENXIO;
> + goto cleanup1;
> + }
> +
> if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) {
> /* force periodic irq to CMOS reset default of 1024Hz;
> *
> --- a/drivers/rtc/rtc-mc146818-lib.c
> +++ b/drivers/rtc/rtc-mc146818-lib.c
> @@ -21,6 +21,13 @@ unsigned int mc146818_get_time(struct rt
>
> again:
> spin_lock_irqsave(&rtc_lock, flags);
> + /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
> + if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) {
> + spin_unlock_irqrestore(&rtc_lock, flags);
> + memset(time, 0xff, sizeof(*time));
> + return 0;
> + }
> +
> /*
> * Check whether there is an update in progress during which the
> * readout is unspecified. The maximum update time is ~2ms. Poll