Re: [PATCH net] can: isotp: isotp_rcv_cf(): fix so->rx race problem

From: Oliver Hartkopp
Date: Fri Jan 28 2022 - 09:51:15 EST


Hello Marc, hello William,

On 28.01.22 09:46, Marc Kleine-Budde wrote:
On 28.01.2022 09:32:40, Oliver Hartkopp wrote:


On 28.01.22 09:07, Marc Kleine-Budde wrote:
On 28.01.2022 08:56:19, Oliver Hartkopp wrote:
I've seen the frame processing sometimes freezes for one second when
stressing the isotp_rcv() from multiple sources. This finally freezes
the entire softirq which is either not good and not needed as we only
need to fix this race for stress tests - and not for real world usage
that does not create this case.

Hmmm, this doesn't sound good. Can you test with LOCKDEP enabled?


#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
# CONFIG_PROVE_LOCKING is not set
CONFIG_PROVE_LOCKING=y

Now enabled even more locking (seen relevant kernel config at the end).

It turns out that there is no visible difference when using spin_lock() or spin_trylock().

I only got some of these kernel log entries

Jan 28 11:13:14 silver kernel: [ 2396.323211] perf: interrupt took too long (2549 > 2500), lowering kernel.perf_event_max_sample_rate to 78250
Jan 28 11:25:49 silver kernel: [ 3151.172773] perf: interrupt took too long (3188 > 3186), lowering kernel.perf_event_max_sample_rate to 62500
Jan 28 11:45:24 silver kernel: [ 4325.583328] perf: interrupt took too long (4009 > 3985), lowering kernel.perf_event_max_sample_rate to 49750
Jan 28 12:15:46 silver kernel: [ 6148.238246] perf: interrupt took too long (5021 > 5011), lowering kernel.perf_event_max_sample_rate to 39750
Jan 28 13:01:45 silver kernel: [ 8907.303715] perf: interrupt took too long (6285 > 6276), lowering kernel.perf_event_max_sample_rate to 31750

But I get these sporadically anyway. No other LOCKDEP splat.

At least the issue reported by William should be fixed now - but I'm still unclear whether spin_lock() or spin_trylock() is the best approach here in the NET_RX softirq?!?

Best regards,
Oliver


$ grep LOCK .config | grep -v BLOCK | grep -v CLOCK
CONFIG_LOCKDEP_SUPPORT=y
# CONFIG_PM_WAKELOCKS is not set
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_QUEUED_RWLOCKS=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_PCI_LOCKLESS_CONFIG=y
# CONFIG_DRM_DEBUG_MODESET_LOCK is not set
CONFIG_HWSPINLOCK=y
CONFIG_I8253_LOCK=y
CONFIG_FILE_LOCKING=y
# CONFIG_SECURITY_LOCKDOWN_LSM is not set
# CONFIG_SECURITY_LANDLOCK is not set
# CONFIG_CRYPTO_DEV_PADLOCK is not set
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
CONFIG_HARDLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=0
# CONFIG_TEST_LOCKUP is not set
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
CONFIG_PROVE_RAW_LOCK_NESTING=y
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
CONFIG_DEBUG_LOCKDEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set