Re: [PATCH v6 3/3] pps: convert pps_device.lock to raw_spinlock_t
From: Calvin Owens
Date: Thu May 28 2026 - 12:14:10 EST
On Thursday 05/28 at 09:49 +0200, Sebastian Andrzej Siewior wrote:
> On 2026-05-26 10:50:28 [-0700], Calvin Owens wrote:
> > It does seem to be correct, with CONFIG_DEBUG_ATOMIC_SLEEP I get:
> >
> > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 31, name: ktimers/1
> > preempt_count: 1, expected: 0
> > RCU nest depth: 2, expected: 2
> > CPU: 1 UID: 0 PID: 31 Comm: ktimers/1 Not tainted 7.1.0-rc5-00004-g85f7c8ee247a-dirty #3 PREEMPT_{RT,LAZY}
> > Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
> > Call trace:
> > show_stack+0x1c/0x38 (C)
> > dump_stack_lvl+0x58/0x78
> > dump_stack+0x14/0x1c
> > __might_resched+0x128/0x168
> > rt_spin_lock+0x34/0x180
> > __wake_up+0x2c/0x70
> > pps_event+0xdc/0x2b0
> > pps_ktimer_event+0x44/0x80
> > call_timer_fn+0x34/0x2d0
>
> This is all threaded and the atomic contex gets here due to the lock
> swap.
Yes, I thought this was from a threaded context.
> > The lock isn't part of the lifetime of the pps_device object, so I think
> > fixing that really is as simple as not holding the raw lock across the
> > wakeup, something like below?
> >
> > diff --git a/drivers/pps/kapi.c b/drivers/pps/kapi.c
> > index dc7fac75ec27..f85929d01e3d 100644
> > --- a/drivers/pps/kapi.c
> > +++ b/drivers/pps/kapi.c
> > @@ -203,17 +203,16 @@ void pps_event(struct pps_device *pps, struct pps_event_time *ts, int event,
> >
> > captured = ~0;
> > }
> >
> > pps_kc_event(pps, ts, event);
> > + if (captured)
> > + pps->last_ev++;
> > + raw_spin_unlock_irqrestore(&pps->lock, flags);
> >
> > /* Wake up if captured something */
> > if (captured) {
> > - pps->last_ev++;
> > wake_up_interruptible_all(&pps->queue);
> > -
> > kill_fasync(&pps->async_queue, SIGIO, POLL_IN);
> > }
> > -
> > - raw_spin_unlock_irqrestore(&pps->lock, flags);
> > }
> > EXPORT_SYMBOL(pps_event);
>
> This does not work because the commit message claims to allow
> pps_event() to be called from hardirq context. So if this is called
> indeed from hardirq, context as claimed in the commit message, then
> dropping the lock early does not help here.
It isn't called from hardirq AFAICS.
> Does it need to call pps_event() in hardirq? Patch #1 takes the
> timestamp in hardirq deferring the remaining part to the thread. What is
> the crucial part here that needs to happen in hardirq context?
This is the artificial testcase which just uses a ktimer, it both
"samples" the time and calls pps_event() from the same timer callback.
But I understood you to be saying it's threaded above, which mirrors my
own understanding. If that's true, I don't see why dropping the lock
early wouldn't be sufficient here?
This is what lockdep says on v6:
=============================
[ BUG: Invalid wait context ]
7.1.0-rc5-x86-kvm-00005-ge4dc519be2c8 #7 Not tainted
-----------------------------
ktimers/0/15 is trying to lock:
ffff888107303118 (&pps->queue){....}-{3:3}, at: __wake_up+0x1f/0x50
other info that might help us debug this:
context-{5:5}
6 locks held by ktimers/0/15:
#0: ffffffff82c3f540 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x38/0x230
#1: ffffffff82fd7340 (rcu_read_lock){....}-{1:3}, at: __local_bh_disable_ip+0x126/0x230
#2: ffff88856bbd93e0 (&base->expiry_lock){+...}-{3:3}, at: timer_expire_remote+0x42/0x70
#3: ffffffff82fd7340 (rcu_read_lock){....}-{1:3}, at: rt_spin_lock+0xd9/0x1a0
#4: ffffc90000083c60 ((&ktimer)){+...}-{0:0}, at: call_timer_fn+0x68/0x250
#5: ffff888107303540 (&pps->lock){....}-{2:2}, at: pps_event+0x3d/0x1e0
stack backtrace:
CPU: 0 UID: 0 PID: 15 Comm: ktimers/0 Not tainted 7.1.0-rc5-x86-kvm-00005-ge4dc519be2c8 #7 PREEMPT_RT
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5b/0x80
__lock_acquire+0x896/0xbc0
lock_acquire.part.0+0x58/0x180
? __wake_up+0x1f/0x50
rt_spin_lock+0x32/0x1a0
? __wake_up+0x1f/0x50
__wake_up+0x1f/0x50
? assert_show+0x40/0x40
pps_event+0xaf/0x1e0
? assert_show+0x40/0x40
? assert_show+0x40/0x40
pps_ktimer_event+0x48/0x70
? assert_show+0x40/0x40
call_timer_fn+0x91/0x250
If I drop the lock early, lockdep and atomic_sleep are both slient.
Thanks,
Calvin