On Fri, Aug 04, 2017 at 04:56:51PM -0500, Haris Okanovic wrote:
I have a latency issue using a SPI-based TPM chip with tpm_tis driver
from non-rt usermode application, which induces ~400 us latency spikes
in cyclictest (Intel Atom E3940 system, PREEMPT_RT_FULL kernel).
The spikes are caused by a stalling ioread8() operation, following a
sequence of 30+ iowrite8()s to the same address. I believe this happens
because the writes are cached (in cpu or somewhere along the bus), which
gets flushed on the first LOAD instruction (ioread*()) that follows.
To use the ARM parlance, these accesses aren't "cached" (which would
imply that a result could be returned to the load from any intermediate
node in the interconnect), but instead are "bufferable".
It is really unfortunate that we continue to run into this class of
problem across various CPU vendors and various underlying bus
technologies; it's the continuing curse of running an PREEMPT_RT on
commodity hardware. RT is not easy :)
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation.
Are we engaged in a game of wack-a-mole with all of the drivers which
use this same access pattern (of which I imagine there are quite a
few!)?
I'm wondering if we should explore the idea of adding a load in the
iowriteN()/writeX() macros (marking those accesses in which reads cause
side effects explicitly, redirecting to a _raw() variant or something).
Obviously that would be expensive for non-RT use cases, but for helping
constrain latency, it may be worth it for RT.
Julia