Re: [PATCH 3/3] workqueue: Schedule work on non-idle cpu insteadof current one

From: Tejun Heo
Date: Tue Sep 25 2012 - 13:56:54 EST


Hello,

On Tue, Sep 25, 2012 at 04:06:08PM +0530, Viresh Kumar wrote:
> +config MIGRATE_WQ
> + bool "(EXPERIMENTAL) Migrate Workqueues to non-idle cpu"
> + depends on SMP && EXPERIMENTAL
> + help
> + Workqueues queues work on current cpu, if the caller haven't passed a
> + preferred cpu. This may wake up an idle CPU, which is actually not
> + required. This work can be processed by any CPU and so we must select
> + a non-idle CPU here. This patch adds in support in workqueue
> + framework to get preferred CPU details from the scheduler, instead of
> + using current CPU.

I don't think it's a good idea to make behavior like this a config
option. The behavior difference is subtle and may induce incorrect
behavior.

> +/* This enables migration of a work to a non-IDLE cpu instead of current cpu */
> +#ifdef CONFIG_MIGRATE_WQ
> +static int wq_select_cpu(void)
> +{
> + return sched_select_cpu(SD_NUMA, -1);
> +}
> +#else
> +#define wq_select_cpu() smp_processor_id()
> +#endif
> +
> /* Serializes the accesses to the list of workqueues. */
> static DEFINE_SPINLOCK(workqueue_lock);
> static LIST_HEAD(workqueues);
> @@ -995,7 +1005,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
> struct global_cwq *last_gcwq;
>
> if (unlikely(cpu == WORK_CPU_UNBOUND))
> - cpu = raw_smp_processor_id();
> + cpu = wq_select_cpu();
>
> /*
> * It's multi cpu. If @wq is non-reentrant and @work
> @@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq, struct work_struct *work)
> {
> int ret;
>
> - ret = queue_work_on(get_cpu(), wq, work);
> - put_cpu();
> + preempt_disable();
> + ret = queue_work_on(wq_select_cpu(), wq, work);
> + preempt_enable();

First of all, I'm not entirely sure this is safe. queue_work() used
to *guarantee* that the work item would execute on the local CPU. I
don't think there are many which depend on that but I'd be surprised
if this doesn't lead to some subtle problems somewhere. It might not
be realistic to audit all users and we might have to just let it
happen and watch for the fallouts. Dunno, still wanna see some level
of auditing.

Also, I'm wondering why this is necessary at all for workqueues. For
schedule/queue_work(), you pretty much know the current cpu is not
idle. For delayed workqueue, sure but for immediate scheduling, why?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/