[RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ?

From: Oleg Nesterov

Date: Sun Sep 28 2025 - 12:22:26 EST

Another problem is that this API is error prone. Two years ago half of the
read_seqbegin_or_lock() users were buggy (they followed the wrong example
from Documentation/locking/seqlock.rst). And even if the code is mostly
correct it is very easy to add a hard-to-detect mistake, see for example

[PATCH][v3] afs: Remove erroneous seq |= 1 in volume lookup loop
https://lore.kernel.org/all/20250910084235.2630-1-lirongqing@xxxxxxxxx/

Can we improve this API?

-------------------------------------------------------------------------------
To simplify, suppose we add the new helper

static inline int need_seqretry_xxx(seqlock_t *lock, int *seq)
{
int ret = !(*seq & 1) && read_seqretry(lock, *seq);

if (ret)
++*seq; /* make this counter odd */

return ret;
}

which can be used instead of need_seqretry(). This way the users do not
need to modify "seq" in the main loop. For example, we can simplify
thread_group_cputime() as follows:

--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -314,7 +314,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
struct signal_struct *sig = tsk->signal;
u64 utime, stime;
struct task_struct *t;
- unsigned int seq, nextseq;
+ unsigned int seq;
unsigned long flags;

/*
@@ -330,9 +330,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)

rcu_read_lock();
/* Attempt a lockless read on the first round. */
- nextseq = 0;
+ seq = 0;
do {
- seq = nextseq;
flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq);
times->utime = sig->utime;
times->stime = sig->stime;
@@ -344,9 +343,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
times->stime += stime;
times->sum_exec_runtime += read_sum_exec_runtime(t);
}
- /* If lockless access failed, take the lock. */
- nextseq = 1;
- } while (need_seqretry(&sig->stats_lock, seq));
+ } while (need_seqretry_xxx(&sig->stats_lock, &seq));
done_seqretry_irqrestore(&sig->stats_lock, seq, flags);
rcu_read_unlock();
}

most (if not all) of other users can be changed the same way.

-------------------------------------------------------------------------------
Or perhaps we can even add a helper that hides all the details, something like

int xxx(seqlock_t *lock, int *seq, int lockless)
{
if (lockless) {
*seq = read_seqbegin(lock);
return 1;
}

if (*seq & 1) {
read_sequnlock_excl(lock);
return 0;
}

if (read_seqretry(lock, *seq)) {
read_seqlock_excl(lock);
*seq = 1;
return 1;
}

return 0;

}

#define __XXX(lock, seq, lockless) \
for (int lockless = 1, seq; xxx(lock, &seq, lockless); lockless = 0)

#define XXX(lock) \
__XXX(lock, __UNIQUE_ID(seq), __UNIQUE_ID(lockless))

?

This way the users can simply do:

seqlock_t sl;

void func(void)
{
XXX(&sl) {
... read-side critical section ...
}
}

using only the new XXX() helper. No need to declare/initialize seq, no need
for need_seqretry/done_seqretry.

What do you think?

Oleg.