Re: infinite loop in read_hpet from ktime_get_boot_fast_ns

From: Peter Zijlstra
Date: Wed Jun 12 2019 - 05:34:06 EST


On Tue, Jun 11, 2019 at 11:09:20PM +0200, Thomas Gleixner wrote:

> > CPU: 1 PID: 7927 Comm: kworker/1:3 Tainted: G OE 4.19.47-1-lts #1
> > Hardware name: Dell Inc. XPS 15 9570/02MJVY, BIOS 1.10.1 04/26/2019

That's a laptop, limited number of CPUs in those.

> > Workqueue: wg-crypt-interface wg_packet_tx_worker [wireguard]
> > RIP: 0010:read_hpet+0x67/0xc0
> > Code: c0 75 11 ba 01 00 00 00 f0 0f b1 15 a3 3d 1a 01 85 c0 74 37 48
> > 89 cf 57 9d 0f 1f 44 00 00 48 c1 ee 20 eb 04 85 c9 74 12 f3 90 <49> 8b
> > 08 48 89 ca 48 c1 ea 20 89 d0 39 f2 74 ea c3 48 8b 05 89 56
> > RSP: 0018:ffffb8d382533e18 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
> > RAX: 0000000018a4c89e RBX: 0000000000000000 RCX: 18a4c89e00000001
> > RDX: 0000000018a4c89e RSI: 0000000018a4c89e RDI: ffffffffb8227980
> > RBP: 000006c1c3f602a2 R08: ffffffffb8205040 R09: 0000000000000000
> > R10: 000001d58fd28efc R11: 0000000000000000 R12: ffffffffb8259a80
> > R13: 00000000ffffffff R14: 0000000518a0d8c4 R15: 000000000010fa5a
> > FS: 0000000000000000(0000) GS:ffff9b90ac240000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00003663b14d9ce8 CR3: 000000030f20a006 CR4: 00000000003606e0
> > Call Trace:
> > ktime_get_mono_fast_ns+0x53/0xa0
> > ktime_get_boot_fast_ns+0x5/0x10
> > wg_packet_tx_worker+0x183/0x220 [wireguard]
> > process_one_work+0x1f4/0x3e0
> > worker_thread+0x2d/0x3e0

This is process context, no IRQs or NMIs around.

> > ? process_one_work+0x3e0/0x3e0
> > kthread+0x112/0x130
> > ? kthread_park+0x80/0x80
> > ret_from_fork+0x35/0x40
> > watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [kworker/1:3:7927]
> >
> > It looks like RIP is spinning in this loop in read_hpet:
> >
> > do {
> > cpu_relax();
> > new.lockval = READ_ONCE(hpet.lockval);
> > } while ((new.value == old.value) && arch_spin_is_locked(&new.lock));

> I think your code is fine. Just 'fast' is relative with the HPET selected
> as clocksource (it's actually aweful slow).

Yeah, the _fast suffix is mostly to denote the NMI safeness of the
accessor, and the corners it cuts to be so, which violate strict
monotonicity.

If you land on the HPET, there is nothing fast about it, HPET sucks,
always has, always will.

(on a personal note, I'd return any hardware that ends up on HPET as
broken to the vendor)

> It probably livelocks in the HPET optimization Waiman did for large
> machines. I'm having a dejavu with that spinlock livelock we debugged last
> year. Peter?

Sorry, no memories left of that :/

CPU0 CPU1

if (trylock(lock))
new = hpet_read();
WRITE_ONCE(hpet, new);
old = READ_ONCE(hpet)
r = READ_ONCE(lock)
if (r)
goto contended
unlock(lock);
return new;
}

do {
new = READ_ONCE(hpet);
} while (new == old && READ_ONCE(lock));

I don't see how we can stay stuck in that loop, either @hpet changing or
@lock being unlocked should terminate it.

Either (or both) must happen given the code.

> Can you please ask the reporter to try the hack below?
>
> Thanks,
>
> tglx
>
> 8<---------------
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index a0573f2e7763..0c9044698489 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -795,8 +795,7 @@ static u64 read_hpet(struct clocksource *cs)
> /*
> * Read HPET directly if in NMI.
> */
> - if (in_nmi())
> - return (u64)hpet_readl(HPET_COUNTER);
> + return (u64)hpet_readl(HPET_COUNTER);

That basically kills the entire optimization. At which point we'll get
even slower. That said, the laptop doesn't have enough CPUs in to
credibly life-lock on this.