Re: Re: [PATCH v2 RESEND/RFC] timer: make deferrable cpu unbound timers really not bound to a cpu

From: Saravana Kannan
Date: Thu Mar 19 2015 - 20:10:53 EST

Next message: Dan Williams: "Re: [PATCH 1/8] pmem: Initial version of persistent memory driver"
Previous message: Michael Ellerman: "Re: [23/32] powerpc: copy_thread(): rename 'arg' argument to 'kthread_arg'"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 09/23/2014 11:33 AM, Thomas Gleixner wrote:

On Mon, 15 Sep 2014, Joonwoo Park wrote:

+#ifdef CONFIG_SMP
+static struct tvec_base *tvec_base_deferral = &boot_tvec_bases;
+#endif

In principle I like the idea of a deferrable wheel, but this
implementation is going to go nowhere.

Hi Thomas,

To give some more context:

This bug is a serious pain in the a** for anyone using deferrable timers or deferrable workqueues today for some periodic work and don't care for which CPU the code runs in.

Couple of examples of such issues in existing code:

1) In a SMP system, CPUfreq governors (ondemand and conservative) end up queueing a deferrable work on every single CPU and the first one to run the deferrable workqueue cancels the work on all other CPUS, runs the work and then sets up a workqueue on all the CPUs again for the next sampling point.

2) Devfreq actually doesn't work well today because it doesn't do this nasty hack like CPUfreq. So, if the devfreq work happens to run on a CPU that's typically idle, then it doesn't run for a long time. I've actually seen logs where the devfreq polling interval is set to 20ms, but ends up running 800ms later.

Don't know how many other drivers are suffering from this bug. Yes, IMHO, this is a bug. When the timer is CPU unbound, anyone who hasn't looked at the actual timer code would assume that it'll run as long as any CPU is busy.

First of all making it SMP only is silly. The deferrable stuff is a
pain in other places as well.

But whats way worse is:

+static inline void __run_timers(struct tvec_base *base, bool try)
{
struct timer_list *timer;

- spin_lock_irq(&base->lock);
+ if (!try)
+ spin_lock_irq(&base->lock);
+ else if (!spin_trylock_irq(&base->lock))
+ return;

Yuck. All cpus fighting about a single spinlock roughly at the same
time? You just created a proper thundering herd cacheline bouncing
issue.

I agree, This is not good. I think a simple way to fix this is to have only the CPU that's the jiffy owner to run through this deferrable timer list.

That should address any unnecessary cache bouncing issues. Would that be an acceptable solution to this?

No way. We have already mechanisms in place to deal with such
problems, you just have to use them.

I'm not sure which problem you are referring to here. Or what the already existing solutions are.

I don't think you were referring to the "deferrable timer getting delayed for long periods despite CPUs being active" problem, because I don't think we have a better solution than this patch (with the jiffy owner CPU fix). Asking every driver that doesn't care which CPU the work runs on to queue a work or set up a timer on every CPU is definitely not a good solution -- that's spreading the complexity to every other driver instead of fixing it in one place. And that causes unnecessary cache pollution too.

Thoughts? I would really like to see a fix for this go in soon so that we can simplify cpufreq and have devfreq and other drivers work correctly.

Thanks,
Saravana

--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Dan Williams: "Re: [PATCH 1/8] pmem: Initial version of persistent memory driver"
Previous message: Michael Ellerman: "Re: [23/32] powerpc: copy_thread(): rename 'arg' argument to 'kthread_arg'"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]