Re: [PATCH v4 1/3] platform/chrome: cros_ec_spi: Move to real time priority for transfers

From: Guenter Roeck
Date: Wed May 15 2019 - 13:04:18 EST


On Wed, May 15, 2019 at 9:48 AM Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> In commit 37a186225a0c ("platform/chrome: cros_ec_spi: Transfer
> messages at high priority") we moved transfers to a high priority
> workqueue. This helped make them much more reliable.
>
> ...but, we still saw failures.
>
> We were actually finding ourselves competing for time with dm-crypt
> which also scheduled work on HIGHPRI workqueues. While we can
> consider reverting the change that made dm-crypt run its work at
> HIGHPRI, the argument in commit a1b89132dc4f ("dm crypt: use
> WQ_HIGHPRI for the IO and crypt workqueues") is somewhat compelling.
> It does make sense for IO to be scheduled at a priority that's higher
> than the default user priority. It also turns out that dm-crypt isn't
> alone in using high priority like this. loop_prepare_queue() does
> something similar for loopback devices.
>
> Looking in more detail, it can be seen that the high priority
> workqueue isn't actually that high of a priority. It runs at MIN_NICE
> which is _fairly_ high priority but still below all real time
> priority.
>
> Should we move cros_ec_spi to real time priority to fix our problems,
> or is this just escalating a priority war? I'll argue here that
> cros_ec_spi _does_ belong at real time priority. Specifically
> cros_ec_spi actually needs to run quickly for correctness. As I
> understand this is exactly what real time priority is for.
>
> There currently doesn't appear to be any way to use the standard
> workqueue APIs with a real time priority, so we'll switch over to
> using using a kthread worker. We'll match the priority that the SPI
> core uses when it wants to do things on a realtime thread and just use
> "MAX_RT_PRIO - 1".
>
> This commit plus the patch ("platform/chrome: cros_ec_spi: Request the
> SPI thread be realtime") are enough to get communications very close
> to 100% reliable (the only known problem left is when serial console
> is turned on, which isn't something that happens in shipping devices).
> Specifically this test case now passes (tested on rk3288-veyron-jerry):
>
> dd if=/dev/zero of=/var/log/foo.txt bs=4M count=512&
> while true; do
> ectool version > /dev/null;
> done
>
> It should be noted that "/var/log" is encrypted (and goes through
> dm-crypt) and also passes through a loopback device.
>
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>

Reviewed-by: Guenter Roeck <groeck@xxxxxxxxxxxx>