[PATCH RT 01/11] timers: do not raise softirq unconditionally

From: Steven Rostedt
Date: Fri Feb 28 2014 - 08:37:05 EST


3.10.32-rt31-rc1 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

Mike,

On Thu, 7 Nov 2013, Mike Galbraith wrote:

> On Thu, 2013-11-07 at 04:26 +0100, Mike Galbraith wrote:
> > On Wed, 2013-11-06 at 18:49 +0100, Thomas Gleixner wrote:
>
> > > I bet you are trying to work around some of the side effects of the
> > > occasional tick which is still necessary despite of full nohz, right?
> >
> > Nope, I wanted to check out cost of nohz_full for rt, and found that it
> > doesn't work at all instead, looked, and found that the sole running
> > task has just awakened ksoftirqd when it wants to shut the tick down, so
> > that shutdown never happens.
>
> Like so in virgin 3.10-rt. Box is x3550 M3 booted nowatchdog
> rcu_nocbs=1-3 nohz_full=1-3, and CPUs1-3 are completely isolated via
> cpusets as well.

well, that very same problem is in mainline if you add "threadirqs" to
the command line. But we can be smart about this. The untested patch
below should address that issue. If that works on mainline we can
adapt it for RT (needs a trylock(&base->lock) there).

Though it's not a full solution. It needs some thought versus the
softirq code of timers. Assume we have only one timer queued 1000
ticks into the future. So this change will cause the timer softirq not
to be called until that timer expires and then the timer softirq is
going to do 1000 loops until it catches up with jiffies. That's
anything but pretty ...

What worries me more is this one:

pert-5229 [003] d..h1.. 684.482618: softirq_raise: vec=9 [action=RCU]

The CPU has no callbacks as you shoved them over to cpu 0, so why is
the RCU softirq raised?

Thanks,

tglx
------------------
Message-id: <alpine.DEB.2.02.1311071158350.23353@xxxxxxxxxxxxxxxxxxxxxxx>
|CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo
Cc: stable-rt@xxxxxxxxxxxxxxx
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
---
include/linux/hrtimer.h | 3 +--
kernel/hrtimer.c | 31 +++++++------------------------
kernel/timer.c | 28 +++++++++++++++++++++++++---
3 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 79a7a35..bdbf77db 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -461,9 +461,8 @@ extern int schedule_hrtimeout_range_clock(ktime_t *expires,
unsigned long delta, const enum hrtimer_mode mode, int clock);
extern int schedule_hrtimeout(ktime_t *expires, const enum hrtimer_mode mode);

-/* Soft interrupt function to run the hrtimer queues: */
+/* Called from the periodic timer tick */
extern void hrtimer_run_queues(void);
-extern void hrtimer_run_pending(void);

/* Bootup initialization: */
extern void __init hrtimers_init(void);
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index a63cfaf..a7e90b2 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1691,30 +1691,6 @@ static void run_hrtimer_softirq(struct softirq_action *h)
}

/*
- * Called from timer softirq every jiffy, expire hrtimers:
- *
- * For HRT its the fall back code to run the softirq in the timer
- * softirq context in case the hrtimer initialization failed or has
- * not been done yet.
- */
-void hrtimer_run_pending(void)
-{
- if (hrtimer_hres_active())
- return;
-
- /*
- * This _is_ ugly: We have to check in the softirq context,
- * whether we can switch to highres and / or nohz mode. The
- * clocksource switch happens in the timer interrupt with
- * xtime_lock held. Notification from there only sets the
- * check bit in the tick_oneshot code, otherwise we might
- * deadlock vs. xtime_lock.
- */
- if (tick_check_oneshot_change(!hrtimer_is_hres_enabled()))
- hrtimer_switch_to_hres();
-}
-
-/*
* Called from hardirq context every jiffy
*/
void hrtimer_run_queues(void)
@@ -1727,6 +1703,13 @@ void hrtimer_run_queues(void)
if (hrtimer_hres_active())
return;

+ /*
+ * Check whether we can switch to highres mode.
+ */
+ if (tick_check_oneshot_change(!hrtimer_is_hres_enabled())
+ && hrtimer_switch_to_hres())
+ return;
+
for (index = 0; index < HRTIMER_MAX_CLOCK_BASES; index++) {
base = &cpu_base->clock_base[index];
if (!timerqueue_getnext(&base->active))
diff --git a/kernel/timer.c b/kernel/timer.c
index 48652cc..3d7313c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1443,8 +1443,6 @@ static void run_timer_softirq(struct softirq_action *h)
irq_work_run();
#endif

- hrtimer_run_pending();
-
if (time_after_eq(jiffies, base->timer_jiffies))
__run_timers(base);
}
@@ -1454,8 +1452,32 @@ static void run_timer_softirq(struct softirq_action *h)
*/
void run_local_timers(void)
{
+ struct tvec_base *base = __this_cpu_read(tvec_bases);
+
hrtimer_run_queues();
- raise_softirq(TIMER_SOFTIRQ);
+ /*
+ * We can access this lockless as we are in the timer
+ * interrupt. If there are no timers queued, nothing to do in
+ * the timer softirq.
+ */
+#ifdef CONFIG_PREEMPT_RT_FULL
+ if (!spin_do_trylock(&base->lock)) {
+ raise_softirq(TIMER_SOFTIRQ);
+ return;
+ }
+#endif
+ if (!base->active_timers)
+ goto out;
+
+ /* Check whether the next pending timer has expired */
+ if (time_before_eq(base->next_timer, jiffies))
+ raise_softirq(TIMER_SOFTIRQ);
+out:
+#ifdef CONFIG_PREEMPT_RT_FULL
+ rt_spin_unlock_after_trylock_in_irq(&base->lock);
+#endif
+ /* The ; ensures that gcc won't complain in the !RT case */
+ ;
}

#ifdef __ARCH_WANT_SYS_ALARM
--
1.8.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/