Re: [virtio-dev] [RFC PATCH 1/1] can: virtio: Initial virtio CAN driver.

From: Harald Mommer
Date: Tue May 23 2023 - 09:39:53 EST


Hello Vincent,

On 15.05.23 07:58, Vincent Mailhol wrote:
Hi Harald,

On Fri. 12 May 2023 at 22:19, Harald Mommer
<harald.mommer@xxxxxxxxxxxxxxx> wrote:
Hello Vincent,

searched for the old E-Mail, this was one of that which slipped through.
Too much of those.

On 05.11.22 10:21, Vincent Mailhol wrote:
On Fry. 4 nov. 2022 at 20:13, Arnd Bergmann<arnd@xxxxxxxxxx> wrote:
On Thu, Nov 3, 2022, at 13:26, Harald Mommer wrote:
On 25.08.22 20:21, Arnd Bergmann wrote:
...
The messages are not necessarily processed in sequence by the CAN stack.
CAN is priority based. The lower the CAN ID the higher the priority. So
a message with CAN ID 0x100 can surpass a message with ID 0x123 if the
hardware is not just simple basic CAN controller using a single TX
mailbox with a FIFO queue on top of it.
Really? I acknowledge that it is priority based *on the bus*, i.e. if
two devices A and B on the same bus try to send CAN ID 0x100 and 0x123
at the same time, then device A will win the CAN arbitration.
However, I am not aware of any devices which reorder their own stack
according to the CAN IDs. If I first send CAN ID 0x123 and then ID
0x100 on the device stack, 0x123 would still go out first, right?
The CAN hardware may be a basic CAN hardware: Single mailbox only with a
TX FIFO on top of this.

No reordering takes place, the CAN hardware will try to arbitrate the
CAN bus with a low priority CAN message (big CAN ID) while some high
priority CAN message (small CAN ID) is waiting in the FIFO. This is
called "internal priority inversion", a property of basic CAN hardware.
A basic CAN hardware does exactly what you describe.

Should be the FIFO in software it's a bad idea to try to improve this
doing some software sorting, the processing time needed is likely to
make things even worse. Therefore no software does this or at least it's
not recommended to do this.

But the hardware may also be a better one. No FIFO but a lot of TX
mailboxes. A full CAN hardware tries to arbitrate the bus using the
highest priority waiting CAN message considering all hardware TX
mailboxes. Such a better (full CAN) hardware does not cause "internal
priority inversion" but tries to arbitrate the bus in the correct order
given by the message IDs.

We don't know about the actually used CAN hardware and how it's used on
this level we are with our virtio can device. We are using SocketCAN, no
information about the properties of the underlying hardware is provided
at some API. May be basic CAN using a FIFO and a single TX mailbox or
full CAN using a lot of TX mailboxes in parallel.

On the bus it's guaranteed always that the sender with the lowest CAN ID
winds regardless which hardware is used, the only difference is whether
we have "internal priority inversion" or not.

If I look at the CAN stack = Software + hardware (and not only software)
it's correct: The hardware device may re-order if it's a better (full
CAN) one and thus the actual sending on the bus is not done in the same
sequence as the messages were provided internally (e.g. at some socket).
OK. Thanks for the clarification.

So, you are using scatterlist to be able to interface with the
different CAN mailboxes. But then, it means that all the heuristics to
manage those mailboxes are done in the virtio host.

There is some heuristic when VIRTIO_CAN_F_LATE_TX_ACK is supported on the device side. The feature means that the host marks a TX message as done not at the moment when it's scheduled for sending but when it has been really sent on the bus.

To do that SocketCAN needs to be configured to receive it's own sent message. On RX the device identifies the message which has been sent on the bus. The heuristic is going through the list of pending messages, check CAN ID and payload and mark the respective message as done.

Problem with SocketCAN: There is a load case (full sending without any delay in both directions) where it seems that own sent messages are getting lost in the software stack. Thus we get in a state where the list of pending messages gets full and TX gets stuck.

The feature flag is not offered in the open source device, it is only experimental in our proprietary device and normally disabled.

Without this feature there is no heuristic, just send to SocketCAN and put immediately as used (done). But for an AUTOSAR CAN driver this means CanIf_TxConfirmation() came too early, not late when the message "has been transmitted on the CAN network" but already earlier when the message is put to SocketCAN scheduled for transmission.

Did you consider exposing the number of supported mailboxes as a
configuration parameter and let the virtio guest manage these? In
Linux, it is supported since below commit:

commit 038709071328 ("can: dev: enable multi-queue for SocketCAN devices")
Link:https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgit.kernel.org%2ftorvalds%2fc%2f038709071328&umid=67080c1c-b5d1-4d20-a9eb-ab7f9a062932&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-9ae22f0c43ab3effc4ba0f9fd0327c9852d5d05a

Generally, from a design perspective, isn't it better to make the
virtio host as dumb as possible and let the host do the management?

I was not aware of this patch.

But thought about different priorities. 2 priorities, low priority for CAN messages which may go into some FIFO suffering priority inversion and high priority for CAN messages going to mailboxes. The very first draft specification had this not knowing about some restrictions in the Linux environment. It had the number of places for each priority (low: FIFO places, high: mailboxes) in the config space. Everything going into a single TX queue but with some priority field. Got the comment on the list to use a dedicated queue for high priority messages instead of using a priority field in the message itself.

It would be easy to do if there was an AUTOSAR CAN driver used as back end, in this case you configure it and know the capabilities and configuration.

But looking into for example m_can.c the information is not available. Checked now again.

=> The information about underlying hardware properties is not available outside the CAN driver

And I also looked now into the patch you sent:

dev.h:

#define alloc_candev(sizeof_priv, echo_skb_max) \
    alloc_candev_mqs(sizeof_priv, echo_skb_max, 1, 1)
#define alloc_candev_mq(sizeof_priv, echo_skb_max, count) \
    alloc_candev_mqs(sizeof_priv, echo_skb_max, count, count)

=> Every single driver uses alloc_candev(), none uses alloc_candev_mq(). So the patch which came in already 2018 is still an offering which is not used at all.

To have multiple priorities with queues we needed a way in user land

- to determine the number of queues (priorities)
- to address the queues
- to determine the number of resources behind each queue for flow control purposes
- to determine the nature of the queue (basic CAN FIFO with n places or full CAN queue with m mailboxes)

There is nothing of this in place.

=> We are currently not in the position to support different priority queues in Linux.

BTW: Even then we would probably need the heuristic on the device side when VIRTIO_CAN_F_LATE_TX_ACK is negotiated. I don't think it was a good idea to use 1 queue for basic CAN and m queues for m full CAN mailboxes, probably it was better to have a low priority queue and a high priority queue. But as there is nothing in place currently beside this patch you mentioned this is an issue to think about in the future.

Regards
Harald