Re: [PATCH 1/1] can: m_can: Control tx flow to avoid message stuck
From: Markus Schneider-Pargmann
Date: Thu Jan 09 2025 - 10:43:46 EST
Hi,
On Wed, Jan 08, 2025 at 02:31:12PM +0530, subramanian.mohan@xxxxxxxxx wrote:
> From: Subramanian Mohan <subramanian.mohan@xxxxxxxxx>
>
> The prolonged testing of passing can messages between
> two Elkhartlake platforms resulted in message stuck
> i.e Message did not receive at receiver side
Can you please describe the reason for the stuck messages in your
commit message? I am reading this but I don't understand why this
happens or why your proposed solution helps.
>
> Contolling TX i.e TEFN bit helped to resolve the message
> stuck issue.
>
> The current solution is enhanced/optimized from the below patch:
> https://lore.kernel.org/lkml/20230623051124.64132-1-kumari.pallavi@xxxxxxxxx/T/
>
> Setup used to reproduce the issue:
>
> +---------------------+ +----------------------+
> |Intel ElkhartLake | |Intel ElkhartLake |
> | +--------+ | | +--------+ |
> | |m_can 0 | |<=======>| |m_can 0 | |
> | +--------+ | | +--------+ |
> +---------------------+ +----------------------+
>
> Steps to be run on the two Elkhartlake HW:
> 1)Bus-Rate is 1 MBit/s
> 2)Busload during the test is about 40%
> 3)we initialize the CAN with following commands
> 4)ip link set can0 txqueuelen 100/1024/2048
> 5)ip link set can0 up type can bitrate 1000000
>
> Python scripts are used send and receive the can messages
> between the EHL systems.
>
> Signed-off-by: Hahn Matthias <matthias.hahn@xxxxxxxxx>
> Signed-off-by: Subramanian Mohan <subramanian.mohan@xxxxxxxxx>
> ---
> drivers/net/can/m_can/m_can.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
> index 97cd8bbf2e32..0a2c9a622842 100644
> --- a/drivers/net/can/m_can/m_can.c
> +++ b/drivers/net/can/m_can/m_can.c
> @@ -1220,7 +1220,7 @@ static void m_can_coalescing_update(struct m_can_classdev *cdev, u32 ir)
> static int m_can_interrupt_handler(struct m_can_classdev *cdev)
> {
> struct net_device *dev = cdev->net;
> - u32 ir = 0, ir_read;
> + u32 ir = 0, ir_read, new_interrupts;
> int ret;
>
> if (pm_runtime_suspended(cdev->dev))
> @@ -1283,6 +1283,9 @@ static int m_can_interrupt_handler(struct m_can_classdev *cdev)
> ret = m_can_echo_tx_event(dev);
> if (ret != 0)
> return ret;
> +
> + new_interrupts = cdev->active_interrupts & ~(IR_TEFN);
> + m_can_interrupt_enable(cdev, new_interrupts);
Here is a theoretical situation of two messages being sent. The first is
being sent and handled in this interrupt handler. Then it would disable
the TEFN bit right? If the second message wasn't done sending yet, how
would it ever call the interrupt handler if the interrupt is disabled?
Also you are disabling this interrupt here regardless of the type of
mcan device and also regardless of the coalescing state. In the transmit
part you are only enabling it for non-peripheral devices. For peripheral
mcan devices this would also introduce an additional two transfers per
transmit.
In which situations is this really necessary? Does it help to implement
coalescing for non-peripheral devices?
Best
Markus
> }
> }
>
> @@ -1989,6 +1992,7 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
> struct m_can_classdev *cdev = netdev_priv(dev);
> unsigned int frame_len;
> netdev_tx_t ret;
> + u32 new_interrupts;
>
> if (can_dev_dropped_skb(dev, skb))
> return NETDEV_TX_OK;
> @@ -2008,8 +2012,11 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
>
> if (cdev->is_peripheral)
> ret = m_can_start_peripheral_xmit(cdev, skb);
> - else
> + else {
> + new_interrupts = cdev->active_interrupts | IR_TEFN;
> + m_can_interrupt_enable(cdev, new_interrupts);
> ret = m_can_tx_handler(cdev, skb);
> + }
>
> if (ret != NETDEV_TX_OK)
> netdev_completed_queue(dev, 1, frame_len);
> --
> 2.35.3
>
Attachment:
signature.asc
Description: PGP signature