Re: [PATCH 4/4] iio: imu: add BNO055 serdev driver

From: Andrea Merello
Date: Mon Jul 19 2021 - 11:46:39 EST


Il giorno lun 19 lug 2021 alle ore 16:15 Andy Shevchenko
<andriy.shevchenko@xxxxxxxxxxxxxxx> ha scritto:
>
> On Mon, Jul 19, 2021 at 02:59:30PM +0200, Andrea Merello wrote:
> > Il giorno lun 19 lug 2021 alle ore 13:56 Andy Shevchenko
> > <andriy.shevchenko@xxxxxxxxxxxxxxx> ha scritto:
> > >
> > > On Mon, Jul 19, 2021 at 10:49:54AM +0200, Andrea Merello wrote:
> > > > Il giorno sab 17 lug 2021 alle ore 17:48 Jonathan Cameron
> > > > <jic23@xxxxxxxxxx> ha scritto:
> > > > > On Thu, 15 Jul 2021 16:17:42 +0200
> > > > > Andrea Merello <andrea.merello@xxxxxxxxx> wrote:
>
> ...
>
> > > > > > +/*
> > > > > > + * Register writes cmd have the following format
> > > > > > + * +------+------+-----+-----+----- ... ----+
> > > > > > + * | 0xAA | 0xOO | REG | LEN | payload[LEN] |
> > > > > > + * +------+------+-----+-----+----- ... ----+
> > > > > > + *
> > > > > > + * Register write responses have the following format
> > > > > > + * +------+----------+
> > > > > > + * | 0xEE | ERROCODE |
> > > > > > + * +------+----------+
> > > > > > + *
> > > > > > + * Register read have the following format
> > > > > > + * +------+------+-----+-----+
> > > > > > + * | 0xAA | 0xO1 | REG | LEN |
> > > > > > + * +------+------+-----+-----+
> > > > > > + *
> > > > > > + * Successful register read response have the following format
> > > > > > + * +------+-----+----- ... ----+
> > > > > > + * | 0xBB | LEN | payload[LEN] |
> > > > > > + * +------+-----+----- ... ----+
> > > > > > + *
> > > > > > + * Failed register read response have the following format
> > > > > > + * +------+--------+
> > > > > > + * | 0xEE | ERRCODE| (ERRCODE always > 1)
> > > > > > + * +------+--------+
> > > > > > + *
> > > > > > + * Error codes are
> > > > > > + * 01: OK
> > > > > > + * 02: read/write FAIL
> > > > > > + * 04: invalid address
> > > > > > + * 05: write on RO
> > > > > > + * 06: wrong start byte
> > > > > > + * 07: bus overrun
> > > > > > + * 08: len too high
> > > > > > + * 09: len too low
> > > > > > + * 10: bus RX byte timeout (timeout is 30mS)
> > > > > > + *
> > > > > > + *
> > > > > > + * **WORKAROUND ALERT**
> > > > > > + *
> > > > > > + * Serial communication seems very fragile: the BNO055 buffer seems to overflow
> > > > > > + * very easy; BNO055 seems able to sink few bytes, then it needs a brief pause.
> > > > > > + * On the other hand, it is also picky on timeout: if there is a pause > 30mS in
> > > > > > + * between two bytes then the transaction fails (IMU internal RX FSM resets).
> > > > > > + *
> > > > > > + * BMU055 has been seen also failing to process commands in case we send them
> > > > > > + * too close each other (or if it is somehow busy?)
> > > > > > + *
> > > > > > + * One idea would be to split data in chunks, and then wait 1-2mS between
> > > > > > + * chunks (we hope not to exceed 30mS delay for any reason - which should
> > > > > > + * be pretty a lot of time for us), and eventually retry in case the BNO055
> > > > > > + * gets upset for any reason. This seems to work in avoiding the overflow
> > > > > > + * errors, but indeed it seems slower than just perform a retry when an overflow
> > > > > > + * error occur.
> > > > > > + * In particular I saw these scenarios:
> > > > > > + * 1) If we send 2 bytes per time, then the IMU never(?) overflows.
> > > > > > + * 2) If we send 4 bytes per time (i.e. the full header), then the IMU could
> > > > > > + * overflow, but it seem to sink all 4 bytes, then it returns error.
> > > > > > + * 3) If we send more than 4 bytes, the IMU could overflow, and I saw it sending
> > > > > > + * error after 4 bytes are sent; we have troubles in synchronizing again,
> > > > > > + * because we are still sending data, and the IMU interprets it as the 1st
> > > > > > + * byte of a new command.
> > > > > > + *
> > > > > > + * So, we workaround all this in the following way:
> > > > > > + * In case of read we don't split the header but we rely on retries; This seems
> > > > > > + * convenient for data read (where we TX only the hdr).
> > > > > > + * For TX we split the transmission in 2-bytes chunks so that, we should not
> > > > > > + * only avoid case 2 (which is still manageable), but we also hopefully avoid
> > > > > > + * case 3, that would be by far worse.
> > > > >
> > > > > Nice docs and this sounds terrible!
> > > >
> > > > Indeed.. If anyone has nicer ideas, or is aware about better
> > > > workaround, I would really love to know...
> > >
> > > This needs somebody to go thru data sheet and check for possibilities, what you
> > > described above is not gonna fly. Okay, "in a robust way".
> > >
> > > I can't believe there is nothing in the communication protocol that may
> > > increase a robustness.
> >
> > The serial protocol is both described in the datasheet and in an
> > application note "BNO055UART interface". Both of them mention the fact
> > that the IMU could just fail in processing the commands and responds
> > with a status message with the "overflow" error code. The application
> > note says this can happen because of an internal IMU buffer clearing
> > stuff not happening in time, and that you have to retry the command in
> > such case (which works for read commands, because after the header the
> > IMU will always respond with something).
> >
> > They say nothing about the fact that the IMU could decide to respond
> > with an "overflow" status message when a write command is still being
> > TXed, even if it is not finished yet, but this actually happens (seen
> > at least after the 4-bytes header).
>
> (1)
>
> >
> > I think there is not much other information about this in datasheet
> > and application note. Besides, the message formats are also described
> > the comments in bno055_sl.c
> >
> > Given that the IMU behaves like this, I could only see three possible
> > workarounds for managing write commands:
> > 1 - be quick enough on RX side in catching the IMU overflow status
> > response before sending the next char, which seems unfeasible.
> > 2 - be slow enough to let the IMU do its own stuff, which seems doable.
> > 3 - let the mess happen and try to recover: when we get the IMU
> > overflow error then we might still being TXing; in this case we stop
> > and we wait for the IMU to complain for a malformed command, but I'm
> > unsure how the IMU could handle this: it will refuse the wrong
> > starting byte (unless our payload is 0xAA by chance), but then how
> > many bytes does it throw away? How may (malformed) commands would it
> > try to extract from a the garbage we TXed (how may complaints response
> > would we receive and need to wait) ? And what if there is something in
> > payload that resembles a valid command and got accepted? This seems
> > worse than workaround #2
>
> I believe the #3 is the right thing to do actually.
>
> The payload is up to 128 bytes and speed is fixed. I believe that firmware has
> internally the state of the input processing. OTOH the UART is duplex and what
> you need in the driver is to have a callback that will read whatever the answer
> from the sensor will be at the time it appears.

I can add a reader thread that always process IMU data, but I doubt we
can make assumptions on the fact it will be triggered in time to avoid
to TX extra characters after IMU signals an overflow; this seems
really a RT stuff, and it would even depends by the USART HW (think
about USB-UART adaptors).

So, how could we implement #3 dealing correctly with the issue about
the IMU possiblly misinterpreting the bytes we might have already sent
out? What's if they are a valid command by chance? The problem here is
that this protocol has no any CRC or something like that to make sure
garbage is really thrown away.

> Also note that Linux is not an RTOS and you may end up, maybe rarely, in the
> case which resembles the #3 while using workaround #2.

Yes, in case we sleep more than 30mS we fail. I would say it's very
unlikely, but that's true.

Maybe if I add a reader thread to the workaround #2, considering we
sleep every two bytes, then we can know about IMU errors - maybe still
not before TXing any further data - but before we TX the bare minimum
data amount in order to produce an unintentionally valid write command
(should be 5 bytes).

Anyway it seems to me that there is no perfectly robust way to deal
with this IMU.. But, in real word scenarios, workaround #2 seem to
work well enough.

> On top of that you demolish the idea of using DMA with UART.

True, but we really have no rush for write commands, so that it
shouldn't be an issue indeed - and apart for the few bytes of the
calibration data, that we might discuss about, all other writes are so
short that DMA has probably no real advantage (OTOH read command
doesn't have any extra sleep, so we are fine with reading the IMU
measures).

> (Btw, AN012 [1] says 100ms is the write timeout for the next byte,
> and not 30ms.)

Yes: there is an inconsistency here: the datasheet says 30mS, the AN
says 100mS. We should then consider not to exceed 30mS.

> As AN012 rightfully pointed out the UART is _async_ interface, so I believe (1)
> is covered by this, meaning that error respond may appear _at any time_ on the
> (host-side) Rx line.

Ah, that is an interesting interpretation :)

> My personal take away is never ever use UART for this kind (*) of
> communications and this sensor specifically.

Consider that this kind of devices are position dependant, e.g. you
want to fix them on your arms, or on your robot feet, etc. I2C is
electrically unsuitable for long off-board wires, while serial lines
should be by far better in this case.

> *) time-based IPCs are doomed by definition in non-RTOS environments with UART
> hardware interface.

We have to live with this :/ But indeed having sensors that measure
real-world physical phenomena probably are a bit RT-ish anyway..

> [1]: BST-BNO055-AN012-00 | Revision 1.0 | June 2015
>
> > What I meant: given this IMU behaviour, if someone has a better idea
> > about how to deal with it, I would listen :)
> >
> > > > > > + */
>
> ...
>
> > > > > > +/* Read operation overhead:
> > > > > > + * 4 bytes req + 2byte resp hdr
> > > > > > + * 6 bytes = 60 bit (considering 1start + 1stop bits).
> > > > > > + * 60/115200 = ~520uS
> > > > > > + * In 520uS we could read back about 34 bytes that means 3 samples, this means
> > > > > > + * that in case of scattered read in which the gap is 3 samples or less it is
> > > > > > + * still convenient to go for a burst.
> > > > > > + * We have to take into account also IMU response time - IMU seems to be often
> > > > > > + * reasonably quick to respond, but sometimes it seems to be in some "critical
> > > > > > + * section" in which it delays handling of serial protocol.
> > > > > > + * By experiment, it seems convenient to burst up to about 5/6-samples-long gap
> > >
> > > Missed perial and entire comment needs proper style and space occupation ratio.
> >
> > Perial? But OK: text reflow and I see the 1st line for multilne
>
> Period.

Ah, OK :)

> > commend is not correct.
> >
> > > > > > + */
>
> --
> With Best Regards,
> Andy Shevchenko
>
>