Re: linux-next: tracebacks in workqueue.c/__flush_work()

From: Tetsuo Handa
Date: Wed Feb 06 2019 - 11:39:19 EST


On 2019/02/07 1:23, Guenter Roeck wrote:
> On Wed, Feb 06, 2019 at 11:57:45PM +0900, Tetsuo Handa wrote:
>> On 2019/02/06 23:36, Guenter Roeck wrote:
>>> On Wed, Feb 06, 2019 at 03:31:09PM +0900, Tetsuo Handa wrote:
>>>> (Adding linux-arch ML.)
>>>>
>>>> Rusty Russell wrote:
>>>>> Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> writes:
>>>>>> (Adding Chris Metcalf and Rusty Russell.)
>>>>>>
>>>>>> If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not
>>>>>> evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at
>>>>>> previous for_each_online_cpu() loop. Guenter Roeck found a problem among three
>>>>>> commits listed below.
>>>>>>
>>>>>> Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective")
>>>>>> expects that has_work is evaluated by for_each_cpu().
>>>>>>
>>>>>> Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything")
>>>>>> assumes that for_each_cpu() does not need to evaluate has_work.
>>>>>>
>>>>>> Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().")
>>>>>> expects that has_work is evaluated by for_each_cpu().
>>>>>>
>>>>>> What should we do? Do we explicitly evaluate has_work if NR_CPUS == 1 ?
>>>>>
>>>>> No, fix the API to be least-surprise. Fix 2d3854a37e8b767a too.
>>>>>
>>>>> Doing anything else would be horrible, IMHO.
>>>>>
>>>>
>>>> Fixing 2d3854a37e8b767a might involve subtle changes. If we do
>>>>
>>>
>>> Why not fix the macros ?
>>>
>>> #define for_each_cpu(cpu, mask) \
>>> for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
>>>
>>> does not really make sense since it does not evaluate mask.
>>>
>>> #define for_each_cpu(cpu, mask) \
>>> for ((cpu) = 0; (cpu) < 1 && cpumask_test_cpu((cpu), (mask)); (cpu)++)
>>>
>>> or something similar might do it.
>>
>> Fixing macros is fine, The problem is that "mask" becomes evaluated
>> which might be currently undefined or unassigned if CONFIG_SMP=n.
>> Evaluating "mask" generates expected behavior for lru_add_drain_all()
>> case. But there might be cases where evaluating "mask" generate
>> unexpected behavior/results.
>
> Interesting notion. I would have assumed that passing a parameter
> to a function or macro implies that this parameter may be used.
>
> This makes me wonder - what is the point of ", (mask)" in the current
> macros ? It doesn't make sense to me.

I guess it is to avoid "unused argument" warning; but optimization
accepted passing even "undefined argument".

>
> Anyway, I agree that fixing the macro might result in some failures.
> However, I would argue that those failures would actually be bugs,
> hidden by the buggy macros. But of course that it just my opinion.

Yes, they are bugs which should be fixed. But since suddenly changing
these macros might break something, I suggest temporarily managing at
lru_add_drain_all() side for now, and make sure we have enough period
at linux-next.git for testing changes to these macros.