Re: [PATCH V3] arm64: Don't flush tlb while clearing the accessed bit

From: Will Deacon
Date: Tue Oct 30 2018 - 07:50:18 EST

[Sorry to be "that person" but please can you use plain text for your mail?
This is getting really hard to follow.]

On Tue, Oct 30, 2018 at 11:17:34AM +0530, Ashish Mhetre wrote:
> On 29/10/18 4:25 PM, Will Deacon wrote:
> On Mon, Oct 29, 2018 at 02:55:58PM +0530, Ashish Mhetre wrote:
> From: Alex Van Brunt <avanbrunt@xxxxxxxxxx>
> Accessed bit is used to age a page and in generic implementation there is
> flush_tlb while clearing the accessed bit.
> Flushing a TLB is overhead on ARM64 as access flag faults don't get
> translation table entries cached into TLB's. Flushing TLB is not necessary
> for this. Clearing the accessed bit without flushing TLB doesn't cause data
> corruption on ARM64.
> In our case with this patch, speed of reading from fast NVMe/SSD through
> PCIe got improved by 10% ~ 15% and writing got improved by 20% ~ 40%.
> So for performance optimisation don't flush TLB when clearing the accessed
> bit on ARM64.
> x86 made the same optimization even though their TLB invalidate is much
> faster as it doesn't broadcast to other CPUs.
> Ok, but they may end up using IPIs so lets avoid these vague performance
> claims in the log unless they're backed up with numbers.
> By numbers do you mean the actual benchmark values?

What I mean is, if we're going to claim that x86 TLB invalidation "is much
faster" than arm64, I'd prefer that there was some science behind it.
However, I think in this case it's not even relevant, so we can just rewrite
the commit message.

How about the patch below -- does that work for you?