Re: [PATCH] coresight: dynamic-replicator: Fix handling of multiple connections

From: Suzuki K Poulose
Date: Mon Apr 27 2020 - 09:48:25 EST


On 04/27/2020 10:45 AM, Mike Leach wrote:
HI,

On Mon, 27 Apr 2020 at 10:15, Suzuki K Poulose <suzuki.poulose@xxxxxxx> wrote:

On 04/26/2020 03:37 PM, Sai Prakash Ranjan wrote:
Since commit 30af4fb619e5 ("coresight: dynamic-replicator:
Handle multiple connections"), we do not make sure that
the other port is disabled when the dynamic replicator is
enabled. This is seen to cause the CPU hardlockup atleast
on SC7180 SoC with the following topology when enabling ETM
with ETR as the sink via sysfs. Since there is no trace id
logic in coresight yet to make use of multiple sinks in
parallel for different trace sessions, disable the other
port when one port is turned on.

etm0_out
|
apss_funnel_in0
|
apss_merge_funnel_in
|
funnel1_in4
|
merge_funnel_in1
|
swao_funnel_in
|
etf_in
|
swao_replicator_in
|
replicator_in
|
etr_in

Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
CPU: 7 PID: 0 Comm: swapper/7 Tainted: G S B 5.4.25 #100
Hardware name: Qualcomm Technologies, Inc. SC7180 IDP (DT)
Call trace:
dump_backtrace+0x0/0x188
show_stack+0x20/0x2c
dump_stack+0xdc/0x144
panic+0x168/0x370
arch_seccomp_spec_mitigate+0x0/0x14
watchdog_timer_fn+0x68/0x290
__hrtimer_run_queues+0x264/0x498
hrtimer_interrupt+0xf0/0x22c
arch_timer_handler_phys+0x40/0x50
handle_percpu_devid_irq+0x8c/0x158
__handle_domain_irq+0x84/0xc4
gic_handle_irq+0x100/0x1c4
el1_irq+0xbc/0x180
arch_cpu_idle+0x3c/0x5c
default_idle_call+0x1c/0x38
do_idle+0x100/0x280
cpu_startup_entry+0x24/0x28
secondary_start_kernel+0x15c/0x170
SMP: stopping secondary CPUs

Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@xxxxxxxxxxxxxx>
Tested-by: Stephen Boyd <swboyd@xxxxxxxxxxxx>



This is not sufficient. You must prevent another session trying to
enable the other port of the replicator as this could silently fail
the "on-going" session. Not ideal. Fail the attempt to enable a port
if the other port is active. You could track this in software and
fail early.

Suzuki

While I have no issue in principle with not enabling a path to a sink
that is not in use - indeed in some cases attaching to unused sinks
can cause back-pressure that slows throughput (cf TPIU) - I am
concerned that this modification is masking an underlying issue with
the platform in question.

Should we decide to enable the diversion of different IDs to different
sinks or allow different sessions go to different sinks, then this has
potential to fail on the SC7180 SoC - and it will be difficult in
future to associate a problem with this discussion.

Mike,

I think thats a good point.
Sai, please could we narrow down this to the real problem and may be
work around it for the "device" ? Do we know which sink is causing the
back pressure ? We could then push the "work around" to the replicator
it is connected to.

Suzuki