Re: [PATCH] rtc: report time-retrieval errors when updating alarm
From: Brian Norris
Date: Mon Oct 21 2019 - 14:08:24 EST
On Mon, Oct 21, 2019 at 10:48 AM Alexandre Belloni
<alexandre.belloni@xxxxxxxxxxx> wrote:
> On 21/10/2019 10:20:08-0700, Brian Norris wrote:
> > Hi Alexandre!
> >
> > On Mon, Oct 21, 2019 at 9:11 AM Alexandre Belloni
> > <alexandre.belloni@xxxxxxxxxxx> wrote:
> > > On 21/05/2018 09:42:22-0700, Brian Norris wrote:
> > > > __rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium).
> > > > When it does, we don't report the error, but instead calculate a
> > > > 1-second alarm based on the potentially-garbage 'tm' (in practice,
> > > > __rtc_read_time() zeroes out the time first, so it's likely to still be
> > > > all 0).
> > > >
> > > > Let's propagate the error instead.
> > > >
> > >
> > > I submitted
> > > https://lore.kernel.org/linux-rtc/20191021155631.3342-2-alexandre.belloni@xxxxxxxxxxx/T/#u
> > > to solve this after looking at all the implication checking the return
> > > value of __rtc_read_time had.
> >
> > Only about a year and a half late, nice!
>
> I know, right? :) The fact is that this is not a common issue or at
> least, I didn't have any report that this was causing real problems in
> the field. So it ended up being one of the longest standing patch in
> patchwork...
I suppose I could have put specific examples in here: the Rockchip
RK3399-based Gru family of Chromebooks
(arch/arm64/boot/dts/rockchip/rk3399-gru-{kevin,bob,scarlet}*.dts) use
the Chrome OS EC-based RTC (drivers/rtc/rtc-cros-ec.c), and the EC
protocol is prone to occasional errors. So we definitely have a
concrete case where this problem can be tickled. I guess I was too
terse in summarizing that as "if the RTC uses an unreliable medium"?
As for the actual symptoms: this was part of a variety of problems
that resulted in seeing interrupt storms from our EC/RTC when running
`hwclock -r`. I believe there were other patches that were more
critical to resolving the worst symptoms, but this error was noticed
along the way. If you care to read more, you can see our downstream
kernel patches here, when we first handled this problem:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1067442
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1066984
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1069546
Unfortunately, the bug links are private (they were dealing with
partner/factory issues), so you can only glean the implicit
information from the code. And since this was over a year ago, my
memory is a little fuzzy on what exactly the source of the interrupt
storm was...
> >Fortunately we have a decent
> > (albeit time-consuming) process for rebasing our downstream patches in
> > Chrome OS kernels...
> >
>
> Do you need that patch backported to LTS kernels?
Eh, I dunno. If anything that'll just cause us merge troubles (but not
too much) on our Chrome OS kernels, which already carry my patch. But
if there are any non-Chrome-OS users of these Chromebooks (there are)
that are seeing this problem (I'm not sure), they might appreciate it.
By the way, I wonder if your patch actually deserves a "Reported-by".
I suppose I also left off Jeffy as the reporter, but it would be:
Reported-by: Jeffy Chen <jeffy.chen@xxxxxxxxxxxxxx>
Reported-by: Brian Norris <briannorris@xxxxxxxxxxxx>
Brian