Am 06.11.2024 um 13:16 schrieb H. Nikolaus Schaller <hns@xxxxxxxxxxxxx>:After reverting this patch, I get some sporadic write errors but no kernel crashes:
Am 06.11.2024 um 10:23 schrieb Andreas Kemnade <andreas@xxxxxxxxxxxx>:Indeed, I can confirm with your sequence (and bmg driver voluntarily
Am Wed, 11 Sep 2024 11:40:04 +0200
schrieb "H. Nikolaus Schaller" <hns@xxxxxxxxxxxxx>:
Hi,Reproduced some problem here:
Am 28.04.2023 um 20:30 schrieb Reid Tonking <reidt@xxxxxx>:I have tested one GTA04A5 board where this patch breaks boot on
On 10:43-20230428, Tony Lindgren wrote:
* Raghavendra, Vignesh <vigneshr@xxxxxx> [230427 13:18]:Yep, the ARDY always gets set after a new command when register
On 4/27/2023 1:19 AM, Reid Tonking wrote:So is it safe to leave NACK interrupt unhandled until we get the
Using standard mode, rare false ACK responses were appearing with
i2cdetect tool. This was happening due to NACK interrupt
triggering ISR thread before register access interrupt was
ready. Removing the NACK interrupt's ability to trigger ISR
thread lets register access ready interrupt do this instead.
next interrupt, does the ARDY always trigger after hitting this?
Regards,
Tony
access is ready so there's no need for NACK interrupt to control
this.
v4.19.283 or v6.11-rc7 (where it was inherited from some earlier -rc
series).
The device is either stuck with no signs of activity or reports RCU
stalls after a 20 second pause.
i2cset 1 0x69 0x14 0xb6 (reset command for gyro BMG160)
[ 736.136108] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0,
stop: 1
[ 736.136322] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
either with this patch applied:
... system mostly hangs, i2cset does not return.
with it reverted:
... most times I see after this:
[ 736.136505] omap_i2c 48072000.i2c: IRQ (ISR = 0x0002)
and i2cset says:
i2cset: write failed: Remote I/O error
... sometimes:
omap_i2c 48072000.i2c: IRQ (ISR = 0x0004)
and i2cset is successful.
Other register writes seem to work reliably, just the reset command.
I had tested with bmg driver disabled earlier,
so it did not come to light.
disabled so that the effect just comes from the i2c bus & client chip).
1. echo blacklist bmg160_i2c >/etc/modprobe.d/test.conf
2. reboot & login:
3.
Last login: Wed Nov 6 11:24:37 UTC 2024 on ttyO2
root@letux:~# dmesg|fgrep bmg
root@letux:~# i2cset -y 1 0x69 0x14 0xb6
root@letux:~# i2cset -y 1 0x69 0x14 0xb6
root@letux:~# i2cset -y 1 0x69 0x14 0xb6
root@letux:~# i2cset -y 1 0x69 0x14 0xb6
--- hangs for some seconds ---
[ 109.664245] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 109.670318] rcu: 0-...!: (2100 ticks this GP) idle=7e74/1/0x40000004 softirq=9248/9248 fqs=0
[ 109.679260] rcu: (t=2100 jiffies g=11389 q=33 ncpus=1)
[ 109.684753] rcu: rcu_preempt kthread timer wakeup didn't happen for 2099 jiffies! g11389 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[ 109.696685] rcu: Possible timer handling issue on cpu=0 timer-softirq=4004
[ 109.704010] rcu: rcu_preempt kthread starved for 2100 jiffies! g11389 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 109.714935] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 109.724517] rcu: RCU grace-period kthread stack dump:
[ 109.729797] task:rcu_preempt state:I stack:0 pid:15 tgid:15 ppid:2 flags:0x00000000
[ 109.739593] Call trace:
[ 109.739593] __schedule from schedule+0x3c/0x64
[ 109.747039] schedule from schedule_timeout+0xa8/0xd4
[ 109.752349] schedule_timeout from rcu_gp_fqs_loop+0x148/0x370
[ 109.758514] rcu_gp_fqs_loop from rcu_gp_kthread+0xec/0x124
[ 109.764373] rcu_gp_kthread from kthread+0xfc/0x108
[ 109.769500] kthread from ret_from_fork+0x14/0x28
[ 109.774444] Exception stack(0xf0041fb0 to 0xf0041ff8)
[ 109.779754] 1fa0: 00000000 00000000 00000000 00000000
[ 109.788330] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 109.796905] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 109.803863] CPU: 0 UID: 0 PID: 3210 Comm: loginwindow Not tainted 6.12.0-rc6-letux+ #169
[ 109.803894] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[ 109.803894] PC is at handle_softirqs+0x84/0x300
[ 109.803924] LR is at handle_softirqs+0x54/0x300
[ 109.803955] pc : [<c0133c3c>] lr : [<c0133c0c>] psr: 60070113
[ 109.803955] sp : f0001fa0 ip : 844ce392 fp : c0f02080
[ 109.803985] r10: f0651be0 r9 : c1008d28 r8 : f0651be8
[ 109.803985] r7 : c0f02d40 r6 : 00000200 r5 : c0e91600 r4 : c0e91600
[ 109.803985] r3 : 2e70d000 r2 : 00000000 r1 : c0e91600 r0 : c23cad00
[ 109.804016] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 109.804016] Control: 10c5387d Table: 82b70019 DAC: 00000051
[ 109.804016] Call trace:
[ 109.804046] handle_softirqs from __irq_exit_rcu+0x6c/0xb4
[ 109.804077] __irq_exit_rcu from irq_exit+0x8/0x10
[ 109.804077] irq_exit from call_with_stack+0x18/0x20
[ 109.804138] call_with_stack from __irq_svc+0x98/0xcc
[ 109.804138] Exception stack(0xf0651b60 to 0xf0651ba8)
[ 109.804168] 1b60: c2c8f300 f0651ce0 c085aec0 c2c8f300 00000000 00000019 00000000 00000000
[ 109.804168] 1b80: f0651be8 00000000 f0651be0 00000000 ffffffff f0651bb0 c02ba850 c085aec0
[ 109.804199] 1ba0: a0070113 ffffffff
[ 109.804199] __irq_svc from sock_poll+0x0/0xbc
[ 109.804229] sock_poll from do_sys_poll+0x2a8/0x460
[ 109.804260] do_sys_poll from sys_poll+0x74/0xe8
[ 109.804290] sys_poll from ret_fast_syscall+0x0/0x54
[ 109.804290] Exception stack(0xf0651fa8 to 0xf0651ff0)
[ 109.804321] 1fa0: 0000409b 00162f90 beeb07cc 00000001 ffffffff 00000000
[ 109.804321] 1fc0: 0000409b 00162f90 b61c3080 000000a8 00000000 00162f9c 00163f90 beeb0874
[ 109.804351] 1fe0: 000000a8 beeb07a8 b6a83bd7 b6a057e6
root@letux:~# while true; do i2cset -y 1 0x69 0x14 0xb6 && echo good; done
Error: Write failed
good
Error: Write failed
good
good
good
good
Error: Write failed
good
Error: Write failed
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
Error: Write failed
good
^C
root@letux:~#
So there are chips (like BMG160) which might block the SDA/SCL lines in a
strange way where the patched i2c driver fails instead of timing out and
reporting an error.
Therefore, I'd suggest to revert it or find a proper fix.