Re: [PATCH v2 RESEND/RFC] timer: make deferrable cpu unbound timers really not bound to a cpu

From: Saravana Kannan
Date: Tue Mar 24 2015 - 15:37:08 EST


On 03/19/2015 05:10 PM, Saravana Kannan wrote:
On 09/23/2014 11:33 AM, Thomas Gleixner wrote:
On Mon, 15 Sep 2014, Joonwoo Park wrote:
+#ifdef CONFIG_SMP
+static struct tvec_base *tvec_base_deferral = &boot_tvec_bases;
+#endif

In principle I like the idea of a deferrable wheel, but this
implementation is going to go nowhere.

Hi Thomas,

To give some more context:

This bug is a serious pain in the a** for anyone using deferrable timers
or deferrable workqueues today for some periodic work and don't care for
which CPU the code runs in.

Couple of examples of such issues in existing code:

1) In a SMP system, CPUfreq governors (ondemand and conservative) end up
queueing a deferrable work on every single CPU and the first one to run
the deferrable workqueue cancels the work on all other CPUS, runs the
work and then sets up a workqueue on all the CPUs again for the next
sampling point.

2) Devfreq actually doesn't work well today because it doesn't do this
nasty hack like CPUfreq. So, if the devfreq work happens to run on a CPU
that's typically idle, then it doesn't run for a long time. I've
actually seen logs where the devfreq polling interval is set to 20ms,
but ends up running 800ms later.

Don't know how many other drivers are suffering from this bug. Yes,
IMHO, this is a bug. When the timer is CPU unbound, anyone who hasn't
looked at the actual timer code would assume that it'll run as long as
any CPU is busy.

First of all making it SMP only is silly. The deferrable stuff is a
pain in other places as well.

But whats way worse is:

+static inline void __run_timers(struct tvec_base *base, bool try)
{
struct timer_list *timer;

- spin_lock_irq(&base->lock);
+ if (!try)
+ spin_lock_irq(&base->lock);
+ else if (!spin_trylock_irq(&base->lock))
+ return;

Yuck. All cpus fighting about a single spinlock roughly at the same
time? You just created a proper thundering herd cacheline bouncing
issue.

I agree, This is not good. I think a simple way to fix this is to have
only the CPU that's the jiffy owner to run through this deferrable timer
list.

That should address any unnecessary cache bouncing issues. Would that be
an acceptable solution to this?


No way. We have already mechanisms in place to deal with such
problems, you just have to use them.

I'm not sure which problem you are referring to here. Or what the
already existing solutions are.

I don't think you were referring to the "deferrable timer getting
delayed for long periods despite CPUs being active" problem, because I
don't think we have a better solution than this patch (with the jiffy
owner CPU fix). Asking every driver that doesn't care which CPU the work
runs on to queue a work or set up a timer on every CPU is definitely not
a good solution -- that's spreading the complexity to every other driver
instead of fixing it in one place. And that causes unnecessary cache
pollution too.

Thoughts? I would really like to see a fix for this go in soon so that
we can simplify cpufreq and have devfreq and other drivers work correctly.

Thanks,
Saravana


Thomas,

Bump.

-Saravana

--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/