Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain

From: Will Deacon
Date: Tue Jul 09 2019 - 04:08:00 EST


On Wed, Jul 03, 2019 at 02:45:43AM +0000, qi.fuli@xxxxxxxxxxx wrote:
> We used FWQ [1] to do an experiment on 1 node of our HPC environment,
> we expected it would be tens of microseconds of maximum OS jitter, but
> it was
> hundreds of microseconds, which didn't meet our requirement. We tried to
> find
> out the cause by using ftrace, but we cannot find any processes which would
> cause noise and only knew the extension of processing time. Then we
> confirmed
> the CPU instruction count through CPU PMU, we also didn't find any changes.
> However, we found that with the increase of that the TLB flash was called,
> the noise was also increasing. Here we understood that the cause of this
> issue
> is the implementation of Linux's TLB flush for arm64, especially use of
> TLBI-is
> instruction which is a broadcast to all processor core on the system.
> Therefore,
> we made this patch set to fix this issue. After testing for several
> times, the
> noise was reduced and our original goal was achieved, so we do think
> this patch
> makes sense.
>
> As I mentioned, the OS jitter is a vital issue for large-scale HPC
> environment.
> We tried a lot of things to reduce the OS jitter. One of them is task
> separation
> between the CPUs which are used for computing and the CPUs which are
> used for
> maintenance. All of the daemon processes and I/O interrupts are bounden
> to the
> maintenance CPUs. Further more, we used nohz_full to avoid the noise
> caused by
> computing CPU interruption, but all of the CPUs were affected by TLBI-is
> instruction, the task separation of CPUs didn't work. Therefore, we
> would like
> to implement that TLB flush is done on minimal CPUs to reducing the OS
> jitter
> by using this patch set.

So have you confirmed that this is due to TLBI traffic and not, for example,
stores sitting in remote store buffers that get flushed by the IPI or
something else like that? It feels like you're inferring things about the
underlying behaviour, whereas you should be in a position to simulate this
if nothing else.

If it *is* because of TLBI, then where are they coming from? Is FWQ calling
munmap/mprotect all the time? Why?

Will