Re: [PATCH] cpu/hotplug: Cache number of online CPUs

From: Mathieu Desnoyers
Date: Sat Jul 06 2019 - 19:24:14 EST


----- On Jul 5, 2019, at 5:00 PM, Thomas Gleixner tglx@xxxxxxxxxxxxx wrote:

> On Fri, 5 Jul 2019, Thomas Gleixner wrote:
>> On Fri, 5 Jul 2019, Mathieu Desnoyers wrote:
>> > ----- On Jul 5, 2019, at 4:49 AM, Ingo Molnar mingo@xxxxxxxxxx wrote:
>> > > * Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>> > >> The semantic I am looking for here is C11's relaxed atomics.
>> > >
>> > > What does this mean?
>> >
>> > C11 states:
>> >
>> > "Atomic operations specifying memory_order_relaxed are relaxed only with
>> > respect
>> > to memory ordering. Implementations must still guarantee that any given atomic
>> > access
>> > to a particular atomic object be indivisible with respect to all other atomic
>> > accesses
>> > to that object."
>> >
>> > So I am concerned that num_online_cpus() as proposed in this patch
>> > try to access __num_online_cpus non-atomically, and without using
>> > READ_ONCE().
>> >
>> >
>> > Similarly, the update-side should use WRITE_ONCE(). Protecting with a mutex
>> > does not provide mutual exclusion against concurrent readers of that variable.
>>
>> Again. This is nothing new. The current implementation of num_online_cpus()
>> has no guarantees whatsoever.
>>
>> bitmap_hweight() can be hit by a concurrent update of the mask it is
>> looking at.
>>
>> num_online_cpus() gives you only the correct number if you invoke it inside
>> a cpuhp_lock held section. So why do we need that fuzz about atomicity now?
>>
>> It's racy and was racy forever and even if we add that READ/WRITE_ONCE muck
>> then it still wont give you a reliable answer unless you hold cpuhp_lock at
>> least for read. So fore me that READ/WRITE_ONCE is just a cosmetic and
>> misleading reality distortion.
>
> That said. If it makes everyone happy and feel better, I'm happy to add it
> along with a bit fat comment which explains that it's just preventing a
> theoretical store/load tearing issue and does not provide any guarantees
> other than that.

One example where the lack of READ_ONCE() makes me uneasy is found
in drivers/net/wireless/intel/iwlwifi/pcie/trans.c: iwl_pcie_set_interrupt_capa():

static void iwl_pcie_set_interrupt_capa(struct pci_dev *pdev,
struct iwl_trans *trans)
{
struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
int max_irqs, num_irqs, i, ret;
[...]
max_irqs = min_t(u32, num_online_cpus() + 2, IWL_MAX_RX_HW_QUEUES);
for (i = 0; i < max_irqs; i++)
trans_pcie->msix_entries[i].entry = i;

max_entries is an array of IWL_MAX_RX_HW_QUEUES entries. AFAIU, if the compiler
decides to re-fetch num_online_cpus() for the loop condition after hot-plugging
of a few cpus, we end up doing an out-of bound access to the array.

The scenario would be:

- load __num_online_cpus into a register,
- compare register + 2 to IWL_MAX_RX_HW_QUEUES
(let's say the current number of cpus is lower that IWL_MAX_RX_HW_QUEUES - 2)
- a few other cpus are brought online,
- compiler decides to re-fetch __num_online_cpus for the loop bound, because
its value is rightfully assumed to have never changed,
- corruption and sadness follows.

I'm not saying the current compiler implementations actually generate this
for this specific function, but I'm concerned about the fact that they are
within their right to do so. It seems quite fragile to expose a kernel API
which can yield to this kind of subtle bug.

A quick kernel tree grep for both "num_online_cpus" and "min" on the same line
gives 64 results, so it's not an isolated case.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com