Re: [PATCH RESEND v3 2/7] rtc-mc146818-lib: fix RTC presence check
From: Mateusz Jończyk
Date: Thu Nov 25 2021 - 17:15:02 EST
W dniu 24.11.2021 o 23:31, Alexandre Belloni pisze:
> Hello,
>
> By moving this patch later on in the series, you'd avoid the subsequent
> refactor. I think this would be better for bisection later on.
Hi,
There are three issues I'm trying to fix in this series:
1. (less important) Insufficient locking in cmos_set_alarm()
2. misdetection of the RTC CMOS as broken on some motherboards,
3. reading / writing of the RTC alarm time during RTC update-in-progress.
Do you mean I should drop the patch
nr 2. ("rtc-mc146818-lib: fix RTC presence check")
and instead straight away introduce mc146818_avoid_UIP() with the new approach (as in patch 3 in the series),
then modify mc146818_get_time() to use it (as in patch 4 - fixing issue nr 2),
then modify cmos_read_alarm / cmos_set_alarm to use mc146818_avoid_UIP() (patches 5-6, fixing issue no. 3)?
I was afraid this risks some confusion what is being fixed when.
On 19/11/2021 21:42:16+0100, Mateusz Jończyk wrote:
>> To prevent an infinite loop in mc146818_get_time(),
>> commit 211e5db19d15 ("rtc: mc146818: Detect and handle broken RTCs")
>> added a check for RTC availability. Together with a later fix, it
>> checked if bit 6 in register 0x0d is cleared. This, however, caused a
>> false negative on a motherboard with an AMD SB710 southbridge; according
>> to the specification [1], bit 6 of register 0x0d of this chipset is a
>> scratchbit.
>>
>> This caused a regression in Linux 5.11 - the RTC was determined broken
>> by the kernel and not used by rtc-cmos.c [3].
>>
>> As a better alternative, check whether the UIP ("Update-in-progress")
>> bit is set for longer then 10ms. If that is the case, then apparently
>> the RTC is either absent (and all register reads return 0xff) or broken.
>> Also limit the number of loop iterations in mc146818_get_time() to 10 to
>> prevent an infinite loop there.
>>
>> In a previous approach to this problem, I implemented a check whether
>> the RTC_HOURS register contains a value <= 24. This, however, sometimes
>> did not work correctly on my Intel Kaby Lake laptop. According to
>> Intel's documentation [2], "the time and date RAM locations (0-9) are
>> disconnected from the external bus" during the update cycle so reading
>> this register without checking the UIP bit is incorrect.
>>
>> [1] AMD SB700/710/750 Register Reference Guide, page 308,
>> https://developer.amd.com/wordpress/media/2012/10/43009_sb7xx_rrg_pub_1.00.pdf
>>
>> [2] 7th Generation Intel ® Processor Family I/O for U/Y Platforms [...] Datasheet
>> Volume 1 of 2, page 209
>> Intel's Document Number: 334658-006,
>> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/7th-and-8th-gen-core-family-mobile-u-y-processor-lines-i-o-datasheet-vol-1.pdf
>>
>> [3] Functions in arch/x86/kernel/rtc.c apparently were using it.
>>
>> Fixes: 211e5db19d15 ("rtc: mc146818: Detect and handle broken RTCs")
>> Fixes: ebb22a059436 ("rtc: mc146818: Dont test for bit 0-5 in Register D")
>> Signed-off-by: Mateusz Jończyk <mat.jonczyk@xxxxx>
>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: Alessandro Zummo <a.zummo@xxxxxxxxxxxx>
>> Cc: Alexandre Belloni <alexandre.belloni@xxxxxxxxxxx>
>>
>> v2: Tweak commit description, remove "Cc: stable" (I'll send it manually
>> after more regression testing).
>>
>> v3: - add "EXPORT_SYMBOL_GPL(mc146818_does_rtc_work)",
>> - change return type from mc146818_does_rtc_work to bool
> This changelog should come after the --- marker
OK
>
>> ---
>> drivers/rtc/rtc-cmos.c | 10 ++++------
>> drivers/rtc/rtc-mc146818-lib.c | 35 ++++++++++++++++++++++++++++++----
>> include/linux/mc146818rtc.h | 1 +
>> 3 files changed, 36 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
>> index dc3f8b0dde98..9404f58ee01d 100644
>> --- a/drivers/rtc/rtc-cmos.c
>> +++ b/drivers/rtc/rtc-cmos.c
>> @@ -793,16 +793,14 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
>> rename_region(ports, dev_name(&cmos_rtc.rtc->dev));
>> - spin_lock_irq(&rtc_lock);
>> -
>> - /* Ensure that the RTC is accessible. Bit 6 must be 0! */
>> - if ((CMOS_READ(RTC_VALID) & 0x40) != 0) {
>> - spin_unlock_irq(&rtc_lock);
>> - dev_warn(dev, "not accessible\n");
>> + if (!mc146818_does_rtc_work()) {
>> + dev_warn(dev, "broken or not accessible\n");
>> retval = -ENXIO;
>> goto cleanup1;
>> }
>> + spin_lock_irq(&rtc_lock);
>> +
>> if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) {
>> /* force periodic irq to CMOS reset default of 1024Hz;
>> *
>> diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c
>> index dcfaf09946ee..b50612ce1a6d 100644
>> --- a/drivers/rtc/rtc-mc146818-lib.c
>> +++ b/drivers/rtc/rtc-mc146818-lib.c
>> @@ -8,10 +8,36 @@
>> #include <linux/acpi.h>
>> #endif
>> +/*
>> + * If the UIP (Update-in-progress) bit of the RTC is set for more then
>> + * 10ms, the RTC is apparently broken or not present.
>> + */
>> +bool mc146818_does_rtc_work(void)
>> +{
>> + int i;
>> + unsigned char val;
>> + unsigned long flags;
>> +
>> + for (i = 0; i < 10; i++) {
>> + spin_lock_irqsave(&rtc_lock, flags);
>> + val = CMOS_READ(RTC_FREQ_SELECT);
>> + spin_unlock_irqrestore(&rtc_lock, flags);
>> +
>> + if ((val & RTC_UIP) == 0)
>> + return true;
>> +
>> + mdelay(1);
>> + }
>> +
>> + return false;
>> +}
>> +EXPORT_SYMBOL_GPL(mc146818_does_rtc_work);
>> +
>> unsigned int mc146818_get_time(struct rtc_time *time)
>> {
>> unsigned char ctrl;
>> unsigned long flags;
>> + unsigned int iter_count = 0;
>> unsigned char century = 0;
>> bool retry;
>> @@ -20,13 +46,14 @@ unsigned int mc146818_get_time(struct rtc_time *time)
>> #endif
>> again:
>> - spin_lock_irqsave(&rtc_lock, flags);
>> - /* Ensure that the RTC is accessible. Bit 6 must be 0! */
>> - if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x40) != 0)) {
>> - spin_unlock_irqrestore(&rtc_lock, flags);
>> + if (iter_count > 10) {
>> + pr_err_ratelimited("Unable to read current time from RTC\n");
> I'd prefer if we could avoid adding strings in the lib.
The problem is that mc146818_get_time() is used 4 times in the kernel and the callers do not check
its return value. Before this patch, I was getting something like this in dmesg:
[ 0.352905] ------------[ cut here ]------------
[ 0.352905] WARNING: CPU: 0 PID: 1 at drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x1d5/0x230
[ 0.352905] Modules linked in:
[ 0.352905] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0stacj-mj4 #3
[ 0.352905] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./A780GXH/128M, BIOS P1.70 05/14/2010
[ 0.352905] RIP: 0010:mc146818_get_time+0x1d5/0x230
[ 0.352905] Code: 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 4c 89 e6 48 c7 c7 d0 a7 ba a3 e8 ca a8 2f 00 bf 58 89 41 00 e8 30 78 d0 ff e9 43 fe ff
ff <0f> 0b 48 c7 c7 d0 a7 ba a3 4c 89 e6 e8 aa a8 2f 00 48 c7 03 ff ff
[ 0.352905] RSP: 0018:ffffbf8d8001fda8 EFLAGS: 00010002
[...]
[ 0.352905] PM: RTC time: 18446744073709551615:18446744073709551615:18446744073709551615, date: 1899-00-18446744073709551615
which is not very helpful.
So I think I'll modify (in a separate patch) the callers to check the result and print a warning message (alspo in the callers).
For the record, the callers are:
drivers/rtc/rtc-cmos.c: mc146818_get_time(t);
drivers/base/power/trace.c: mc146818_get_time(&time);
arch/alpha/kernel/rtc.c: mc146818_get_time(tm);
arch/x86/kernel/hpet.c: mc146818_get_time(&curr_time);
Greetings,
Mateusz
>> memset(time, 0xff, sizeof(*time));
>> return 0;
>> }
>> + iter_count++;
>> +
>> + spin_lock_irqsave(&rtc_lock, flags);
>> /*
>> * Check whether there is an update in progress during which the
>> diff --git a/include/linux/mc146818rtc.h b/include/linux/mc146818rtc.h
>> index 0661af17a758..69c80c4325bf 100644
>> --- a/include/linux/mc146818rtc.h
>> +++ b/include/linux/mc146818rtc.h
>> @@ -123,6 +123,7 @@ struct cmos_rtc_board_info {
>> #define RTC_IO_EXTENT_USED RTC_IO_EXTENT
>> #endif /* ARCH_RTC_LOCATION */
>> +bool mc146818_does_rtc_work(void);
>> unsigned int mc146818_get_time(struct rtc_time *time);
>> int mc146818_set_time(struct rtc_time *time);
>> -- 2.25.1
>>