[PATCH 5.15 160/530] bpf: Disable preemption when increasing per-cpu map_locked

From: Greg Kroah-Hartman
Date: Mon Oct 24 2022 - 14:00:29 EST


From: Hou Tao <houtao1@xxxxxxxxxx>

[ Upstream commit 2775da21628738ce073a3a6a806adcbaada0f091 ]

Per-cpu htab->map_locked is used to prohibit the concurrent accesses
from both NMI and non-NMI contexts. But since commit 74d862b682f5
("sched: Make migrate_disable/enable() independent of RT"),
migrate_disable() is also preemptible under CONFIG_PREEMPT case, so now
map_locked also disallows concurrent updates from normal contexts
(e.g. userspace processes) unexpectedly as shown below:

process A process B

htab_map_update_elem()
htab_lock_bucket()
migrate_disable()
/* return 1 */
__this_cpu_inc_return()
/* preempted by B */

htab_map_update_elem()
/* the same bucket as A */
htab_lock_bucket()
migrate_disable()
/* return 2, so lock fails */
__this_cpu_inc_return()
return -EBUSY

A fix that seems feasible is using in_nmi() in htab_lock_bucket() and
only checking the value of map_locked for nmi context. But it will
re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered
through non-tracing program (e.g. fentry program).

One cannot use preempt_disable() to fix this issue as htab_use_raw_lock
being false causes the bucket lock to be a spin lock which can sleep and
does not work with preempt_disable().

Therefore, use migrate_disable() when using the spinlock instead of
preempt_disable() and defer fixing concurrent updates to when the kernel
has its own BPF memory allocator.

Fixes: 74d862b682f5 ("sched: Make migrate_disable/enable() independent of RT")
Reviewed-by: Hao Luo <haoluo@xxxxxxxxxx>
Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
Link: https://lore.kernel.org/r/20220831042629.130006-2-houtao@xxxxxxxxxxxxxxx
Signed-off-by: Martin KaFai Lau <martin.lau@xxxxxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---
kernel/bpf/hashtab.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 47eebb88695e..cae858985f0c 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -161,17 +161,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab,
unsigned long *pflags)
{
unsigned long flags;
+ bool use_raw_lock;

hash = hash & HASHTAB_MAP_LOCK_MASK;

- migrate_disable();
+ use_raw_lock = htab_use_raw_lock(htab);
+ if (use_raw_lock)
+ preempt_disable();
+ else
+ migrate_disable();
if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
__this_cpu_dec(*(htab->map_locked[hash]));
- migrate_enable();
+ if (use_raw_lock)
+ preempt_enable();
+ else
+ migrate_enable();
return -EBUSY;
}

- if (htab_use_raw_lock(htab))
+ if (use_raw_lock)
raw_spin_lock_irqsave(&b->raw_lock, flags);
else
spin_lock_irqsave(&b->lock, flags);
@@ -184,13 +192,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab,
struct bucket *b, u32 hash,
unsigned long flags)
{
+ bool use_raw_lock = htab_use_raw_lock(htab);
+
hash = hash & HASHTAB_MAP_LOCK_MASK;
- if (htab_use_raw_lock(htab))
+ if (use_raw_lock)
raw_spin_unlock_irqrestore(&b->raw_lock, flags);
else
spin_unlock_irqrestore(&b->lock, flags);
__this_cpu_dec(*(htab->map_locked[hash]));
- migrate_enable();
+ if (use_raw_lock)
+ preempt_enable();
+ else
+ migrate_enable();
}

static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
--
2.35.1