Re: infinite loop in read_hpet from ktime_get_boot_fast_ns

From: Waiman Long
Date: Tue Jun 11 2019 - 17:45:28 EST


On 6/11/19 5:09 PM, Thomas Gleixner wrote:
> Jason,
>
> On Fri, 7 Jun 2019, Jason A. Donenfeld wrote:
>
> Adding a few more people on cc and keeping full context.
>
>> Hey Thomas,
>>
>> After some discussions here prior about the different clocks
>> available, WireGuard uses ktime_get_boot_fast_ns() pretty extensively.
>> The requirement is for a quasi-accurate monotonic counter that takes
>> into account sleep time, and this seems to fit the bill pretty well.
>> Sultan (CC'd) reported to me a non-reproducible bug he encountered in
>> 4.19.47 (arch's linux-lts package), where the CPU was hung in
>> read_hpet.
>>
>> CPU: 1 PID: 7927 Comm: kworker/1:3 Tainted: G OE 4.19.47-1-lts #1
>> Hardware name: Dell Inc. XPS 15 9570/02MJVY, BIOS 1.10.1 04/26/2019
>> Workqueue: wg-crypt-interface wg_packet_tx_worker [wireguard]
>> RIP: 0010:read_hpet+0x67/0xc0
>> Code: c0 75 11 ba 01 00 00 00 f0 0f b1 15 a3 3d 1a 01 85 c0 74 37 48
>> 89 cf 57 9d 0f 1f 44 00 00 48 c1 ee 20 eb 04 85 c9 74 12 f3 90 <49> 8b
>> 08 48 89 ca 48 c1 ea 20 89 d0 39 f2 74 ea c3 48 8b 05 89 56
>> RSP: 0018:ffffb8d382533e18 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
>> RAX: 0000000018a4c89e RBX: 0000000000000000 RCX: 18a4c89e00000001
>> RDX: 0000000018a4c89e RSI: 0000000018a4c89e RDI: ffffffffb8227980
>> RBP: 000006c1c3f602a2 R08: ffffffffb8205040 R09: 0000000000000000
>> R10: 000001d58fd28efc R11: 0000000000000000 R12: ffffffffb8259a80
>> R13: 00000000ffffffff R14: 0000000518a0d8c4 R15: 000000000010fa5a
>> FS: 0000000000000000(0000) GS:ffff9b90ac240000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00003663b14d9ce8 CR3: 000000030f20a006 CR4: 00000000003606e0
>> Call Trace:
>> ktime_get_mono_fast_ns+0x53/0xa0
>> ktime_get_boot_fast_ns+0x5/0x10
>> wg_packet_tx_worker+0x183/0x220 [wireguard]
>> process_one_work+0x1f4/0x3e0
>> worker_thread+0x2d/0x3e0
>> ? process_one_work+0x3e0/0x3e0
>> kthread+0x112/0x130
>> ? kthread_park+0x80/0x80
>> ret_from_fork+0x35/0x40
>> watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [kworker/1:3:7927]
>>
>> It looks like RIP is spinning in this loop in read_hpet:
>>
>> do {
>> cpu_relax();
>> new.lockval = READ_ONCE(hpet.lockval);
>> } while ((new.value == old.value) && arch_spin_is_locked(&new.lock));

The hang shouldn't happen unless the hpet_lock structure is somehow
corrupted by another unrelated task. If someone inadvertently change the
content of the lock word, live lock will happen.

>> I imagine this could be a bug in the hpet code, or a failure of the
>> hpet hardware. But I thought it'd be most prudent to check, first,
>> whether there are actually very particular conditions on when and
>> where ktime_get_boot_fast_ns and friends can be called. In other
>> words, maybe the bug is actually in my code. I was under the
>> impression that invoking it from anywhere was fine, given the
>> documentation says "NMI safe", but maybe there are still some
>> requirements I should keep in mind?
> I think your code is fine. Just 'fast' is relative with the HPET selected
> as clocksource (it's actually aweful slow).
>
> It probably livelocks in the HPET optimization Waiman did for large
> machines. I'm having a dejavu with that spinlock livelock we debugged last
> year. Peter?
>
> Can you please ask the reporter to try the hack below?
>
> Thanks,
>
> tglx
>
> 8<---------------
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index a0573f2e7763..0c9044698489 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -795,8 +795,7 @@ static u64 read_hpet(struct clocksource *cs)
> /*
> * Read HPET directly if in NMI.
> */
> - if (in_nmi())
> - return (u64)hpet_readl(HPET_COUNTER);

While at it, add:

+ÂÂÂ WARN_ON(arch_spin_is_locked(&hpet.lock));

> + return (u64)hpet_readl(HPET_COUNTER);
>
> /*
> * Read the current state of the lock and HPET value atomically.

Cheers,
Longman