Re: [PATCH v7 1/4] can: m_can: Create a m_can platform framework

From: Dan Murphy
Date: Fri Mar 08 2019 - 15:36:27 EST


On 3/8/19 12:06 PM, Wolfgang Grandegger wrote:
>
>
> Am 08.03.19 um 18:52 schrieb Dan Murphy:
>> On 3/8/19 11:40 AM, Wolfgang Grandegger wrote:
>>> Hello Dan,
>>>
>>> Am 08.03.19 um 18:25 schrieb Dan Murphy:
>>>> On 3/8/19 11:08 AM, Wolfgang Grandegger wrote:
>>>>> Hello,
>>>>>
>>>>> Am 08.03.19 um 16:48 schrieb Dan Murphy:
>>>>>> Wolfgang
>>>>>>
>>>>>> On 3/8/19 8:41 AM, Wolfgang Grandegger wrote:
>>>>>>> Hello Dan,
>>>>>>>
>>>>>>> thinking more about it...
>>>>>>>
>>>>>>> Am 08.03.19 um 14:29 schrieb Wolfgang Grandegger:
>>>>>>>> Hello Dan,
>>>>>>>>
>>>>>>>> Am 08.03.19 um 13:44 schrieb Dan Murphy:
>>>>>>>>> Wolfgang
>>>>>>>>>
>>>>>>>>> On 3/8/19 4:10 AM, Wolfgang Grandegger wrote:
>>>>>>>>>> Hallo Dan,
>>>>>>>>>>
>>>>>>>>>> Am 05.03.19 um 16:52 schrieb Dan Murphy:
>>>>>>>>>>> Create a m_can platform framework that peripherial
>>>>>>>>>>> devices can register to and use common code and register sets.
>>>>>>>>>>> The peripherial devices may provide read/write and configuration
>>>>>>>>>>> support of the IP.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Dan Murphy <dmurphy@xxxxxx>
>>>>>>>>>>> ---
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> v7 - Fixed remaining new checkpatch issues, removed CSR setting, fixed tx hard
>>>>>>>>>>> start function to return tx_busy, and renamed device callbacks - https://lore.kernel.org/patchwork/patch/1047220/
>>>>>>>>>>>
>>>>>>>>>>> v6 - Squashed platform patch to this patch for bissectablity, fixed coding style
>>>>>>>>>>> issues, updated Kconfig help, placed mcan reg offsets back into c file, renamed
>>>>>>>>>>> priv->skb to priv->tx_skb and cleared perp interrupts at ISR start -
>>>>>>>>>>> Patch 1 comments - https://lore.kernel.org/patchwork/patch/1042446/
>>>>>>>>>>> Patch 2 comments - https://lore.kernel.org/patchwork/patch/1042442/
>>>>>>>>>>>
>>>>>>>>>>> drivers/net/can/m_can/Kconfig | 13 +-
>>>>>>>>>>> drivers/net/can/m_can/Makefile | 1 +
>>>>>>>>>>> drivers/net/can/m_can/m_can.c | 700 +++++++++++++------------
>>>>>>>>>>> drivers/net/can/m_can/m_can.h | 110 ++++
>>>>>>>>>>> drivers/net/can/m_can/m_can_platform.c | 202 +++++++
>>>>>>>>>>> 5 files changed, 682 insertions(+), 344 deletions(-)
>>>>>>>>>>> create mode 100644 drivers/net/can/m_can/m_can.h
>>>>>>>>>>> create mode 100644 drivers/net/can/m_can/m_can_platform.c
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/net/can/m_can/Kconfig b/drivers/net/can/m_can/Kconfig
>>>>>>>>>>> index 04f20dd39007..f7119fd72df4 100644
>>>>>>>>>>> --- a/drivers/net/can/m_can/Kconfig
>>>>>>>>>>> +++ b/drivers/net/can/m_can/Kconfig
>>>>>>>>>>> @@ -1,5 +1,14 @@
>>>>>>>>>>> config CAN_M_CAN
>>>>>>>>>>> + tristate "Bosch M_CAN support"
>>>>>>>>>>> + ---help---
>>>>>>>>>>> + Say Y here if you want support for Bosch M_CAN controller framework.
>>>>>>>>>>> + This is common support for devices that embed the Bosch M_CAN IP.
>>>>>>>>>>> +
>>>>>>>>>>> +config CAN_M_CAN_PLATFORM
>>>>>>>>>>> + tristate "Bosch M_CAN support for io-mapped devices"
>>>>>>>>>>> depends on HAS_IOMEM
>>>>>>>>>>> - tristate "Bosch M_CAN devices"
>>>>>>>>>>> + depends on CAN_M_CAN
>>>>>>>>>>> ---help---
>>>>>>>>>>> - Say Y here if you want to support for Bosch M_CAN controller.
>>>>>>>>>>> + Say Y here if you want support for IO Mapped Bosch M_CAN controller.
>>>>>>>>>>> + This support is for devices that have the Bosch M_CAN controller
>>>>>>>>>>> + IP embedded into the device and the IP is IO Mapped to the processor.
>>>>>>>>>>> diff --git a/drivers/net/can/m_can/Makefile b/drivers/net/can/m_can/Makefile
>>>>>>>>>>> index 8bbd7f24f5be..057bbcdb3c74 100644
>>>>>>>>>>> --- a/drivers/net/can/m_can/Makefile
>>>>>>>>>>> +++ b/drivers/net/can/m_can/Makefile
>>>>>>>>>>> @@ -3,3 +3,4 @@
>>>>>>>>>>> #
>>>>>>>>>>>
>>>>>>>>>>> obj-$(CONFIG_CAN_M_CAN) += m_can.o
>>>>>>>>>>> +obj-$(CONFIG_CAN_M_CAN_PLATFORM) += m_can_platform.o
>>>>>>>>>>> diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
>>>>>>>>>>> index 9b449400376b..a60278d94126 100644
>>>>>>>>>>> --- a/drivers/net/can/m_can/m_can.c
>>>>>>>>>>> +++ b/drivers/net/can/m_can/m_can.c
>>>>>>>>>>
>>>>>>>>>> ... snip...
>>>>>>>>>>
>>>>>>>>>>> +static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
>>>>>>>>>>> + struct net_device *dev)
>>>>>>>>>>> +{
>>>>>>>>>>> + struct m_can_priv *priv = netdev_priv(dev);
>>>>>>>>>>> +
>>>>>>>>>>> + if (can_dropped_invalid_skb(dev, skb))
>>>>>>>>>>> + return NETDEV_TX_OK;
>>>>>>>>>>> +
>>>>>>>>>>> + if (priv->is_peripherial) {
>>>>>>>>>>> + if (priv->tx_skb) {
>>>>>>>>>>> + netdev_err(dev, "hard_xmit called while tx busy\n");
>>>>>>>>>>> + return NETDEV_TX_BUSY;
>>>>>>>>>>> + }
>>>>>>>>>>
>>>>>>>>>> The problem with that approach is, that the upper layer will try to
>>>>>>>>>> resubmit the current "skb" but not the previous "tx_skb". And the
>>>>>>>>>> previous "tx_skb" has not been freed yet. I would just drop and free the
>>>>>>>>>> skb and return NETDEV_TX_OK in m_can_tx_handler() for peripheral devices
>>>>>>>>>> (like can_dropped_invalid_skb() does).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> OK.
>>>>>>>>>
>>>>>>>>> So would this also be a bug in the hi3110 and mcp251x drivers (line 521) as well because besides checking tx_length
>>>>>>>>> this is how these drivers are written.
>>>>>>>>
>>>>>>>> This is different. When entering the "start_xmit" routine, the previous
>>>>>>>> TX is still in progress. It will (hopefully) complete soon. Therefore
>>>>>>>> returning NETDEV_TX_BUSY is OK. The "start_xmit" routine will be
>>>>>>>> recalled soon with the same "skb". That scenario should/could also not
>>>>>>>> happen.
>>>>>>>
>>>>>>> In principle, this also applies to the m_can peripheral devices. If
>>>>>>> tx_skb is not NULL, the TX is still in progress and returning
>>>>>>> NETDEV_TX_BUSY is just fine.
>>>>>>>
>>>>>>>>
>>>>>>>> In contrast, in "m_can_tx_handler()", the skb could not be handled
>>>>>>>> because the FIFO is full. The "start_xmit" routine for peripheral
>>>>>>>> devices for that skb already returned NETDEV_TX_OK. Therefore the only
>>>>>>>> meaningful action is to drop the skb. Also this error should not happen
>>>>>>>> and if, something is going really wrong. Therefore I think, a
>>>>>>>> WARN_ONCE() would be even more appropriate. But that should be a
>>>>>>>> separate patch.
>>>>>>>
>>>>>>> But that's a different issue/error. The tx_skb cannot be processed in
>>>>>>> "m_can_tx_handler()". Either we drop it or we re-queue it (retry later).
>>>>>>>
>>>>>>
>>>>>> OK I am a bit confused on this. Are you saying this is not an issue?
>>>>>> Or are you saying I need to check for tx_len like the other code?
>>>>>
>>>>> If you check for tx_skb in the "start_xmit" routine like the hi3110 and
>>>>> mcp251x, it will work the same way. But only, if the "tx_handler()" has
>>>>> fully processed the message. It simple means, the TX is still in
>>>>> progress and will complete soon. But in "m_can_tx_handler()" we return
>>>>> without handling the message! It will never be sent and freed. Or will
>>>>> the "m_can_tx_handler()" retry?
>>>>>
>>>>
>>>> I am not seeing where we are not handling the message in the m_can_tx_handler()
>>>
>>> static void m_can_tx_handler(struct m_can_classdev *priv)
>>> {
>>> ...
>>> /* Check if FIFO full */
>>> if (m_can_tx_fifo_full(priv)) {
>>> /* This shouldn't happen */
>>> netif_stop_queue(dev);
>>> netdev_warn(dev,
>>> "TX queue active although FIFO is full.");
>>> return;
>>> }
>>>
>>> We simply return here. When is the message (tx_skb) processed (sent or freed)?
>>> What happens with tx_skb?
>>>
>>
>> Are you sure you are looking at the right code?
>>
>> For patch version v7 I have the following
>>
>> /* Check if FIFO full */
>> if (m_can_tx_fifo_full(cdev)) {
>> /* This shouldn't happen */
>> netif_stop_queue(dev);
>> netdev_warn(dev,
>> "TX queue active although FIFO is full.");
>> return NETDEV_TX_BUSY;
>> }
>>
>> Which is no change from the original source code.
>
> I know, but for the peripheral devices you have:
>
> static void m_can_tx_work_queue(struct work_struct *ws)
> {
> struct m_can_priv *priv = container_of(ws, struct m_can_priv,
> tx_work);
> netdev_tx_t ret;
>
> ret = m_can_tx_handler(priv);
> if (ret == NETDEV_TX_OK)
> priv->tx_skb = NULL;
> }
>
> What will happen with tx_skb if NETDEV_TX_BUSY? It has not been
> dropped/freed yet?
>

OK I think I see the issue there.

I should probably add can_put_echo_skb if NETDEV_TX_BUSY and always NULL out the SKB.

This appears to be the way the other perp drivers do it as they just put and null the skb
regardless of the return of the handlers.

And clean is called when the BUS is off or coming out of suspend.

Dan

> Wolfgang.
>


--
------------------
Dan Murphy