Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails

From: Mukesh Ojha
Date: Mon Jul 16 2018 - 12:18:03 EST



On 7/13/2018 10:50 PM, John Stultz wrote:
On Fri, Jul 13, 2018 at 12:13 AM, Mukesh Ojha <mojha@xxxxxxxxxxxxxx> wrote:
Hi John,

Thanks for your response
Please find my comments inline.


On 7/11/2018 1:43 AM, John Stultz wrote:
On Fri, Jul 6, 2018 at 6:17 AM, Mukesh Ojha <mojha@xxxxxxxxxxxxxx> wrote:
Currently, there exists a corner case assuming when there is
only one clocksource e.g RTC, and system failed to go to
suspend mode. While resume rtc_resume() injects the sleeptime
as timekeeping_rtc_skipresume() returned 'false' (default value
of sleeptime_injected) due to which we can see mismatch in
timestamps.

This issue can also come in a system where more than one
clocksource are present and very first suspend fails.

Fix this by handling `sleeptime_injected` flag properly.

Success case:
------------
{sleeptime_injected=false}
rtc_suspend() => timekeeping_suspend() => timekeeping_resume() =>

(sleeptime injected)
rtc_resume()

Failure case:
------------
{failure in sleep path} {sleeptime_injected=false}
rtc_suspend() => rtc_resume()

sleeptime injected again which was not required as the suspend failed)

Originally-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Signed-off-by: Mukesh Ojha <mojha@xxxxxxxxxxxxxx>
---
Changes in v3:
* Updated commit subject and description.
* Updated the patch as per the fix given by Thomas Gleixner.

Changes in v2:
* Updated the commit text.
* Removed extra variable and used the earlier static
variable 'sleeptime_injected'.

kernel/time/timekeeping.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4786df9..32ae9ae 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1510,8 +1510,20 @@ void __weak read_boot_clock64(struct timespec64
*ts)
ts->tv_nsec = 0;
}

-/* Flag for if timekeeping_resume() has injected sleeptime */
-static bool sleeptime_injected;
+/*
+ * Flag reflecting whether timekeeping_resume() has injected sleeptime.
+ *
+ * The flag starts of true and is only cleared when a suspend reaches
+ * timekeeping_suspend(), timekeeping_resume() sets it when the
timekeeper
+ * clocksource is not stopping across suspend and has been used to
update
+ * sleep time. If the timekeeper clocksource has stopped then the flag
+ * stays false and is used by the RTC resume code to decide whether
sleep
+ * time must be injected and if so the flag gets set then.
+ *
+ * If a suspend fails before reaching timekeeping_resume() then the flag
+ * stays true and prevents erroneous sleeptime injection.
+ */
+static bool sleeptime_injected = true;
I worry this upside-down logic is too subtle to be easily reasoned
about, and will just lead to future mistakes.

Can we instead call this "suspend_timing_needed" and only set it to
true when we don't inject any sleep time on resume?

I did not get your point "only set it to true when we don't inject any sleep
time on resume? "
How do we know this ?
This question itself depends on the "sleeptime_injected" if it is true means
no need to inject else need to inject.

Also, we need to make this variable back and forth true, false; suspends
path ensures it to make it false.
So yea, I'm not saying logically the code is really any different,
this is more of a naming nit. So instead of having a variable that is
always on that we occasionally turn off, lets invert the naming and
have it be a flag that we occasionally turn on.

I understand your concern about the name of the variable will be misleading.
But the changing Boolean state would not solve the actual issue.

If i understand you correctly you meant below code

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 32ae9ae..becc5bd 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1523,7 +1523,7 @@ void __weak read_boot_clock64(struct timespec64 *ts)
 * If a suspend fails before reaching timekeeping_resume() then the flag
 * stays true and prevents erroneous sleeptime injection.
 */
-static bool sleeptime_injected = true;
+static bool suspend_timing_needed;

Â/* Flag for if there is a persistent clock on this platform */
Âstatic bool persistent_clock_exists;
@@ -1658,7 +1658,7 @@ void timekeeping_inject_sleeptime64(struct timespec64 *delta)
ÂÂÂÂÂÂÂ raw_spin_lock_irqsave(&timekeeper_lock, flags);
ÂÂÂÂÂÂÂ write_seqcount_begin(&tk_core.seq);

-ÂÂÂÂÂÂ sleeptime_injected = true;
+ÂÂÂÂÂÂ suspend_timing_needed = false;

ÂÂÂÂÂÂÂ timekeeping_forward_now(tk);

@@ -1714,10 +1714,10 @@ void timekeeping_resume(void)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ tk->tkr_mono.mask);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ nsec = mul_u64_u32_shr(cyc_delta, clock->mult, clock->shift);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ts_delta = ns_to_timespec64(nsec);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ sleeptime_injected = true;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ suspend_timing_needed = true;
ÂÂÂÂÂÂÂ } else if (timespec64_compare(&ts_new, &timekeeping_suspend_time) > 0) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ sleeptime_injected = true;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ suspend_timing_needed = true;
ÂÂÂÂÂÂÂ }

ÂÂÂÂÂÂÂ if (sleeptime_injected)
@@ -1756,7 +1756,7 @@ int timekeeping_suspend(void)
ÂÂÂÂÂÂÂ if (timekeeping_suspend_time.tv_sec || timekeeping_suspend_time.tv_nsec)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ persistent_clock_exists = true;

-ÂÂÂÂÂÂ sleeptime_injected = false;
+ÂÂÂÂÂÂ suspend_timing_needed = false;

ÂÂÂÂÂÂÂ raw_spin_lock_irqsave(&timekeeper_lock, flags);


This has a problem..



Just the name sleeptime_injected is read a statement, which if we say
is defaults to true, becomes confusing to think about when the
timekeeping_suspend/resume code hasn't yet run (which is the case
where your error cropped up) - and no sleeptime has actually been
injected.

Yes, when very first suspend fails and timekeeping_suspend/resume did not run ; That is the exact issue.
So, exact solution is no need to inject any sleeptime here.

ÂIf we set the default value to false then we will see timekeeping_resume will inject sleeptime by below code which was not intended.

static int rtc_resume(struct device *dev)
{
ÂÂÂÂÂÂÂ struct rtc_deviceÂÂÂÂÂÂ *rtc = to_rtc_device(dev);
ÂÂÂÂÂÂÂ struct rtc_timeÂÂÂÂÂÂÂÂ tm;
ÂÂÂÂÂÂÂ struct timespec64ÂÂÂÂÂÂ new_system, new_rtc;
ÂÂÂÂÂÂÂ struct timespec64ÂÂÂÂÂÂ sleep_time;
ÂÂÂÂÂÂÂ int err;

ÂÂÂÂÂÂÂ if (timekeeping_rtc_skipresume())Â // it will return the value false as sleep failed and timekeeping_resume() did not get called.
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return 0;

 <sleeptime injection happens here>
....
..



So instead if we call it suspend_timing_needed and only set it on in
timekeeping_resume() after the timekeeping code has not injected any
sleep-time, then I think the code will make more sense to read. (And
yes, we still need to set suspend_timing_needed false on
timekeeping_suspend and in the inject_sleeptime call path - the logic
doesn't change, just the naming and boolean state).

Thanks for your time and patience.

-Mukesh

thanks
-john