Re: [RFC PATCH v3] can: virtio: Initial virtio CAN driver.

From: Harald Mommer
Date: Wed May 24 2023 - 09:30:55 EST

Next message: Jisheng Zhang: "Re: [PATCH] riscv: mm: try VMA lock-based page fault handling first"
Previous message: Guenter Roeck: "Re: [PATCH 6.3 000/363] 6.3.4-rc2 review"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello Vincent,

On 15.05.23 08:31, Vincent MAILHOL wrote:

if (err) {

Opened Eclipse, searched for "if (err != 0)" in the kernel code. 290
matches. For "if (ret != 0)" I found now 1970 matches.

Read my comment again. I never mentioned err vs. ret.

I am asking to replace "if (err != 0)" by "if (err)". We are not
using MISRA and there is no concept of essential boolean type here.
You can pass an integer to an if ().

I do not use eclipse, but git can give a few relevant statistics:

$ git grep "if (err != 0)" | wc -l
277
$ git grep "if (err)" | wc -l
34307

And while this is not the topic, "ret" is more popular than "err":

$ git grep "if (ret != 0)" | wc -l
1956
$ git grep "if (ret)" | wc -l
67927

but both are well established usage so I do not really care which one
of "ret" or "err" you use.

Looks like major practice is indeed to omit the != 0 for error indicating return codes so I will adapt to the majority but adding an unlikely. The block is not expected to be entered at all as the comment says however it needs to be there due to defensive programming practice.

=> if (unlikely(ret))

+ /* Not expected to happen */
+ dev_err(dev, "%s(): virtqueue_add_sgs() failed\n", __func__);
+ }
+
+ if (!virtqueue_kick(vq)) {
+ /* Not expected to happen */
+ dev_err(dev, "%s(): Kick failed\n", __func__);
+ }
+
+ while (!virtqueue_get_buf(vq, &len) && !virtqueue_is_broken(vq))
+ wait_for_completion(&priv->ctrl_done);
+
+ mutex_unlock(&priv->ctrl_lock);
+
+ return priv->cpkt_in.result;
+}

...

+static netdev_tx_t virtio_can_start_xmit(struct sk_buff *skb,
+ struct net_device *dev)
+{
+ struct virtio_can_priv *priv = netdev_priv(dev);
+ struct canfd_frame *cf = (struct canfd_frame *)skb->data;
+ struct virtio_can_tx *can_tx_msg;
+ struct virtqueue *vq = priv->vqs[VIRTIO_CAN_QUEUE_TX];
+ struct scatterlist sg_out[1];
+ struct scatterlist sg_in[1];
+ struct scatterlist *sgs[2];

Instead declaring two times an array of 1, can we do:

struct scatterlist sgs[2];

Ooopsy on my side. sgs is an array of pointers so the above is not equivalent.

And doing this:

struct scatterlist *sgs[2];

Would be problematic as the memory of the two elements would not be allocated.

and then use sgs[0] for out and sgs[1] for in?

Or, if you really want to keep sg_out and sg_in, at least do:

struct scatterlist sg_out, sg_in;
struct scatterlist *sgs[] = {&sg_out, &sg_in};

N.B. The same comment also applies to the other places where you are
doing some sg[1] declarations.

Makes thing worse. I'm not even sure whether this is a null change only
or introduces a problem.

virtio strictly separates devices readable and device writeable data.
Therefore I want really to have here 2 separate definitions. The one is
data to the device, the other is data from the device.

ACK. My second example does that:

struct scatterlist sg_out, sg_in;
struct scatterlist *sgs[] = {&sg_out, &sg_in};

First I remove the [1] from sg_out and sg_in to make those scalars instead of arrays of [1].

Your 2nd example above violates the reverse xmas tree rule so I cannot take this one as is.

But got the thing through the compiler with the following construct:

struct scatterlist sg_out, sg_in, *sgs[2] = { &sg_out, &sg_in };

If this had any advantage, I could separate the data further. For
example I could separate the payload from the preceding data. In this
case I had struct scatterlist sg_out[2]. As long as the payload is
small the memcpy for the payload can be justified and [1] is good. In
fact, those are still arrays even if by coincident now the number of
elements is 1.

sg_out and sg_in are only passed to one function: sg_init_one(). And
as the name suggests, sg_init_one expects a single scatterlist, not an
array.

A look at:

$ git grep sg_init_one

show me that doing as "sg_init_one(&foo[0], ...)" is not a popular
solution. The majority does sg_init_one(&foo, ...).

ACK. Checked this also now, saw &foo[0] once or so.

I do get that sgs is an array of arrays. I am just not comfortable
with sg_out and sg_in being declared as arrays because these never get
used as such.

+ unsigned long flags;
+ u32 can_flags;
+ int err;
+ int putidx;
+ netdev_tx_t xmit_ret = NETDEV_TX_OK;
+ const unsigned int hdr_size = offsetof(struct virtio_can_tx_out, sdu);
+
+ if (can_dev_dropped_skb(dev, skb))
+ goto kick; /* No way to return NET_XMIT_DROP here */
+
+ /* No local check for CAN_RTR_FLAG or FD frame against negotiated
+ * features. The device will reject those anyway if not supported.
+ */
+
+ can_tx_msg = kzalloc(sizeof(*can_tx_msg), GFP_ATOMIC);
+ if (!can_tx_msg)
+ goto kick; /* No way to return NET_XMIT_DROP here */

I missed here to update the number of dropped messages.

+
+ can_tx_msg->tx_out.msg_type = cpu_to_le16(VIRTIO_CAN_TX);
+ can_flags = 0;
+
+ if (cf->can_id & CAN_EFF_FLAG) {
+ can_flags |= VIRTIO_CAN_FLAGS_EXTENDED;
+ can_tx_msg->tx_out.can_id = cpu_to_le32(cf->can_id & CAN_EFF_MASK);
+ } else {
+ can_tx_msg->tx_out.can_id = cpu_to_le32(cf->can_id & CAN_SFF_MASK);
+ }
+ if (cf->can_id & CAN_RTR_FLAG)
+ can_flags |= VIRTIO_CAN_FLAGS_RTR;
+ else
+ memcpy(can_tx_msg->tx_out.sdu, cf->data, cf->len);
+ if (can_is_canfd_skb(skb))
+ can_flags |= VIRTIO_CAN_FLAGS_FD;
+
+ can_tx_msg->tx_out.flags = cpu_to_le32(can_flags);
+ can_tx_msg->tx_out.length = cpu_to_le16(cf->len);
+
+ /* Prepare sending of virtio message */
+ sg_init_one(&sg_out[0], &can_tx_msg->tx_out, hdr_size + cf->len);
+ sg_init_one(&sg_in[0], &can_tx_msg->tx_in, sizeof(can_tx_msg->tx_in));
+ sgs[0] = sg_out;
+ sgs[1] = sg_in;
+
+ putidx = virtio_can_alloc_tx_idx(priv);
+
+ if (unlikely(putidx < 0)) {
+ netif_stop_queue(dev);
+ kfree(can_tx_msg);
+ netdev_warn(dev, "TX: Stop queue, no putidx available\n");

ida_alloc_range() can also return -ENOMEM. So the error is not
necessarily because of no putidx available. Maybe better to print the
error message (with %pe to show the mnemotechnic).

%pe does not do that. It works for an error coded in a pointer. I have
here an int.

Yes, and you can use the ERR_PTR() to turn your int into an error pointer.

Do:

$ git grep -A1 "%pe"

if you need examples.

Thanks for this trick with ERR_PTR(). Does not show me a clear text error but dumps the pointer in hex so that an additional hex to decimal conversion becomes necessary to lookup for the error code. May be due to my small embedded setup.

But I realized thinking about all those comments that there is major problem in this block:

- The queue is stopped in this block
- The only place where the queue is re-enabled is when a pending TX message is processed
- If there is a -ENOMEM and no TX message is pending the driver blocks until doomsday

Received also a review comment being to noisy here

- ENOMEM is a rare but valid condition so not tracing this anymore
- ENOSPC should be impossible if flow control works so WARN_ON(putidx == -ENOSPC)

=> Remove queue stop, trace only impossible condition not to be noisy, drop message always and don't forget to increase drop count.

With current code -ENOSPC was seen. There was a bug in virtio_can_read_tx_queue() where the netif_wake_queue() is not done when the tx_lock is still held but immediately afterwards and that was a problem.

Why do we need both VIRTIO_CAN_F_RTR_FRAMES VIRTIO_CAN_FLAGS_RTR?

Is it to manage devices not able to sent remote frames? If so, we may
also need to add a CAN_CTRLMODE_RTR in linux/can/netlink.h?

VIRTIO_CAN_F_RTR_FRAMES is a feature flag. RTR frames may or may not be
supported. AUTOSAR CAN drivers do not know anything about RTR frames.

Now that you say it, it rings a bell.

So indeed, we will probably need a new flag in can/netlink.h to report
to the userland whether a device is capable or not to manage remote
frames.

Currently our virtio devices run in the Linux environment but this may not to be the case for all our virtio devices in the future. It is likely that at some point in time some devices will be migrated to a smaller RTOS environment and then it is well imaginable that someone wants to use an existing AUTOSAR CAN driver as backend for a virtio CAN device. With such a setup no VIRTIO_CAN_F_RTR_FRAMES.

Yes, probably, if we make such a setup possible and want to be clean.

if someone wants to build a virtio CAN device on top of an AUTOSAR CAN
driver with the additional requirement not to change the existing
AUTOSAR CAN driver RTR frames cannot be supported and the feature flag
won't be offered by the virtio device.

VIRTIO_CAN_FLAGS_RTR is a bit to indicate a frame type in a CAN message.
Used internally in an Linux SocketCAN interface.

ACK.

On a side note, did you have a look at:

https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2felixir.bootlin.com%2flinux%2flatest%2fsource%2finclude%2fuapi%2flinux%2fcan%2fnetlink.h%23L95&umid=0b3549ac-d010-4d96-a348-2702485d82af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-75dedb6bf65ef4342bf7c18b89f4f30ed7dc9dc6

?

It lists the different hardware capabilities which may or may not be
present at the hardware level. This list can be used as an input to
decide how to extend the feature bit list.

Looked at this and got some more headache especially with FD. There is CAN_CTRLMODE_FD and CAN_CTRLMODE_FD_NON_ISO. Elsewhere is CANFD_BRS. Virtio CAN with (too much?) AUTOSAR in mind has only an idea of FD and does not distinguish any of this. In AUTOSAR Can_Write() and friends there is only bit 30 which says "this is a CAN FD frame".

Virtio CAN was intended to get CAN for automotive usage into the virtual machine. However it should be open to be extended later if there were requirements.

The hard requirements I got for now are

1. Bring CAN for automotive usage into the virtual machine
2. Do not overload the very first version with features

2.) ensures 1.) will happen.

+
+/* CAN Result Types */
+#define VIRTIO_CAN_RESULT_OK 0
+#define VIRTIO_CAN_RESULT_NOT_OK 1

Silly question, but what is the rationale of not using the error code
from errno.h?

Looked into the AUTOSAR_SWS_StandardTypes.pdf Std_ReturnType:

E_OK = 0

E_NOT_OK = 1

other are "Available for user specific errors" like CAN_BUSY.

Linux is only one operating system. An important one but there are
others. AUTOSAR people may ask you "What is errno.h?" (and also "What is
malloc?").

Sorry, but I do not buy this argument. Do you really now AUTOSAR
developpers who do not know about malloc()?

They know. But neither it's provided by their classic AUTOSAR environment nor are they allowed to use it if MISRA is enforced which is likely the case.

MISRA C 2012 Dir 4.12: "Dynamic memory allocation shall not be used".

MISRA C 2012 Rule 21.3 "The memory allocation and deallocation functions of <stdlib.h> shall not be used".

"What's malloc()" at least in the last company meant "I'm doing MISRA so I'm not supposed to use malloc()".

Our internal client is interested in a Virtio AUTOSAR CAN
driver. So there were reasons to look first into AUTOSAR.

Is it AUTOSAR Classic or AUTOSAR Adaptive?

Looking currently into the specification(s) for the "Classic Platform". There exists no "errno.h".

AUTOSAR Adaptive is POSIX (to some extends):

[SWS_OSI_01001] POSIX PSE51 Interface: [The OSI shall provide OS
functionality with POSIX PSE51 interface, according to the
1003.13-2003 specification.]

Ref: AUTOSAR AP R22-11 - Specification of Operating System Interface
https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fwww.autosar.org%2ffileadmin%2fstandards%2fR22%2d11%2fAP%2fAUTOSAR%5fSWS%5fOperatingSystemInterface.pdf&umid=0b3549ac-d010-4d96-a348-2702485d82af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-1b9dde25c1a8497018be33058ca535bcfe2c272a

There is also a CAN_BUSY for the AUTOSAR Can_Write() to be returned but
this is not needed at this interface as a virtio AUTOSAR CAN driver was
busy when there are no sufficient messages available in the virtqueue,
so for this condition we need no defined error code to be used in a
virtio message.

Virtio block defines VIRTIO_BLK_S_OK 0, VIRTIO_BLK_S_IOERR 1,
VIRTIO_BLK_S_UNSUPP 2.

I do see that some other virtio devices do the same:

https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2felixir.bootlin.com%2flinux%2fv4.6%2fsource%2finclude%2fuapi%2flinux%2fvirtio%5fnet.h%23L140&umid=0b3549ac-d010-4d96-a348-2702485d82af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-698619745d87a25323604ab5a614cc946b24e642

But I am genuinely curious to understand why virtio do not use more
standard error codes.

I cannot speak for virtio. errno.h is ANSI C and POSIX but virtio does
not only address those environments. It is a more general specification.

That's my point. ISO C is so predominant that those error codes are
available nearly everywhere. And this being just some #define, it can
easily be integrated to the few system which do not have this header.

If there is a requirement to make virtio header self contained, then I
would understand why POSIX error code can not be used. But as I said,
I am genously curious to understand the reason behind this choice.

For the virtio RPMB device the result codes in the virtio specification
come for example directly from an JEDEC specification for eMMC. Which
has some connection to a JEDEC UFS specification, same result codes
there. Makes a lot of sense to use those result codes in this context.

As virtio is more general, I have for this also my doubts whether it
really was a good idea to take over the CAN RX and CAN TX message
definitions 1:1 from Linux (if this is possible). Someone proposed but
I've my doubts.

Yes, I did ask here:

https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flore.kernel.org%2flinux%2dcan%2fCAMZ6RqLALOYFWQJ4C4HTaRw7y%2dwaUbqOX0WzrWVNiQG51QexHw%40mail.gmail.com%2f&umid=0b3549ac-d010-4d96-a348-2702485d82af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-6453b89fa56fa0a97905d808b20a7df3eb16f85a

I am still waiting for your answer.

I do understand that Linux is not the only OS. However, it is the only
one with a complehensive set of open source CAN dirvers. Reusing the
Linux structures would allow to reuse bigger chunks of code,
decreasing the amount of effort needed implement drivers in the virtio
host.

That said, the POSIX error code and reusing the Linux CAN structures
are two different topics.

+/* CAN flags to determine type of CAN Id */
+#define VIRTIO_CAN_FLAGS_EXTENDED 0x8000
+#define VIRTIO_CAN_FLAGS_FD 0x4000
+#define VIRTIO_CAN_FLAGS_RTR 0x2000
+
+struct virtio_can_config {
+#define VIRTIO_CAN_S_CTRL_BUSOFF (1u << 0) /* Controller BusOff */
+ /* CAN controller status */
+ __le16 status;
+};
+
+/* TX queue message types */
+struct virtio_can_tx_out {
+#define VIRTIO_CAN_TX 0x0001
+ __le16 msg_type;
+ __le16 length; /* 0..8 CC, 0..64 CAN-FD, 0..2048 CAN-XL, 12 bits */
+ __le32 reserved; /* May be needed in part for CAN XL priority */
+ __le32 flags;
+ __le32 can_id;
+ __u8 sdu[64];
+};
+
+struct virtio_can_tx_in {
+ __u8 result;
+};
+
+/* RX queue message types */
+struct virtio_can_rx {
+#define VIRTIO_CAN_RX 0x0101
+ __le16 msg_type;
+ __le16 length; /* 0..8 CC, 0..64 CAN-FD, 0..2048 CAN-XL, 12 bits */
+ __le32 reserved; /* May be needed in part for CAN XL priority */

Can we use this field to report the classical CAN DLC greater than 8?
If also needed by CAN XL, the field can be turned into a union.

Classical CAN cannot have a DLC > 8. CAN FD has a length up to 64 bytes.

No, The DLC is coded on four bits and ranges from 0 to 15 for both
Classical CAN and CAN-FD.

Please refer to:

commit ea7800565a12 ("can: add optional DLC element to Classical CAN
frame structure")
Link:https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgit.kernel.org%2ftorvalds%2fc%2fea7800565a12&umid=0b3549ac-d010-4d96-a348-2702485d82af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-83a945dba88675fd8f98628601040c6860123b6d

For more details, please refor to section 8.4.2.4 "DLC field" of ISO
11898-1:2015.

I do believe that AUTOSAR do not allow Classical CAN frames with a DLC
greater than 8, but the virtio implementation should support the ISO
definitions.

I see now. The "length" in the more familiar (to me) AUTOSAR CAN_PduType is a "length" and not a "dlc". This detail with the DLC is ISO only and so I missed this. Not only therefore.

The CAN XL priority is 11 bits only and so fits in a __le16. Nobody else used an union so maybe:

__le32 reserved; =>

    __u8 reserved_classic_dlc; /* if CAN classic length = 8 then DLC can be 8..15 */
   __u8 padding;
   __le16 reserved_xl_priority; /* May be needed for CAN XL priority */

The member length is already __le16 to be able to put the 12 bits of 2048 for CAN XL.

What made it more easy to miss is that also the cantools from Ubuntu 20.04 do not support DLC > 8. The change in the cantools came in 20-Nov-2020 by some commits from Oliver Hartkopp. Using newer cantools I can send and receive CAN frames with a DLC > 8 over vcan on my machine without having to hack the cantools. So the prerequisites to develop this feature seem to be available without having to hack cantools or vcan.

Now also not only a new CAN frame structure was introduced with 5.11.0 but also CAN_CTRLMODE_CC_LEN8_DLC. CAN_CTRLMODE_CC_LEN8_DLC seems to be a feature if I understand this correctly. If we want to allow to use an AUTOSAR CAN driver as virtio device back end then also a new feature flag (e.g. VIRTIO_CAN_F_CLASSIC_DLC) would be needed.

If possible I would like to have this as a future feature only changing now the "reserved" field to make clear the planned purposes.

CAN XL into it.

But for CAN XL we need anyway a more critical look from CAN XL experts
on the list. Here in the house there is already only fewer experience
with CAN FD in comparison with classic CAN but none at all with CAN XL.
Too new. If something is done in a stupid way we can define in the
future completely new messages as we have the msg_type. But if no
mistake is made now we can avoid this and enhancing things will be more
simple later. The RX and TX messages are really critical. Some bugs in
the software can be fixed easily. But if we define here something not
future proof this can only be addressed later in the spec with some more
effort.

ACK.

+ __le32 flags;
+ __le32 can_id;
+ __u8 sdu[64];
+};
+
+/* Control queue message types */
+struct virtio_can_control_out {
+#define VIRTIO_CAN_SET_CTRL_MODE_START 0x0201
+#define VIRTIO_CAN_SET_CTRL_MODE_STOP 0x0202
+ __le16 msg_type;
+};
+
+struct virtio_can_control_in {
+ __u8 result;
+};
+
+#endif /* #ifndef _LINUX_VIRTIO_VIRTIO_CAN_H */

Yours sincerely,
Vincent Mailhol

Regards

Harald Mommer

Next message: Jisheng Zhang: "Re: [PATCH] riscv: mm: try VMA lock-based page fault handling first"
Previous message: Guenter Roeck: "Re: [PATCH 6.3 000/363] 6.3.4-rc2 review"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]